TW202008302A

TW202008302A - DNA-based data access by converting the input data into a set of nucleotide sequences and synthesizing a set of nucleic acids including the set of nucleotide sequences

Info

Publication number: TW202008302A
Application number: TW107127162A
Authority: TW
Inventors: 樊隆
Original assignee: 大陸商南京金斯瑞生物科技有限公司
Priority date: 2018-08-03
Filing date: 2018-08-03
Publication date: 2020-02-16
Also published as: TWI770247B

Abstract

The present invention relates to DNA-based data storage. The present invention provides an exemplary method for storing the input data on a nucleic acid including: converting the input data into a set of nucleotide sequences and synthesizing a set of nucleic acids including the set of nucleotide sequences. The aforementioned conversion includes a data processing step and a nucleotide encoding step. The aforementioned data processing step includes converting the aforementioned input data into a binary string. The aforementioned nucleotide encoding step includes converting a binary string by using a 5-bit transcoding frame to obtain the aforementioned set of nucleotide sequences.

Description

DNA-based data access

本發明通常關於資料儲存及取回，且更具體地，關於實現可靠且有效的基於DNA的資料儲存和取回的技術。 The present invention generally relates to data storage and retrieval, and more specifically, to techniques for achieving reliable and effective DNA-based data storage and retrieval.

利用DNA作為用於資料儲存和取回(retrival)工具的設計可以追溯到1988年，當時Joe Davis和他的合作者創建了一個名為「Mocrovenus」的合成DNA，用於編碼圖示(icon)並將其整合到大腸桿菌細胞中。與例如磁帶和硬碟等傳統儲存介質相比，基於DNA的儲存具有更高密度(例如~1mm³用於儲存1EB資料)、更長的儲存期(例如，在-18℃下超過100萬年)以及更低的維護成本的優勢。DNA儲存是一個前瞻性的研究領域，其基於用於DNA儲存介質生成的寡核苷酸合成(尤其是像CustomArray的高通量合成平台)和用於資訊取回的測序(尤其是下一代測序(NGS)，如Illumina HiSeq 2500和MiSeq)。 The use of DNA as a tool for data storage and retrieval (retrival) can be traced back to 1988, when Joe Davis and his collaborators created a synthetic DNA called "Mocrovenus" for coding icons (icon) And integrate it into E. coli cells. Compared with traditional storage media such as magnetic tapes and hard drives, DNA-based storage has a higher density (for example ~1mm ³ for storing 1EB data) and a longer storage period (for example, more than 1 million years at -18°C) And the advantage of lower maintenance costs. DNA storage is a prospective field of research based on oligonucleotide synthesis for DNA storage media generation (especially high-throughput synthesis platforms like CustomArray) and sequencing for information retrieval (especially next-generation sequencing (NGS), such as Illumina HiSeq 2500 and MiSeq).

然而，目前，基於DNA的資料儲存具有許多限制。例如，DNA合成的生產成本相當高，而由於測序，資料取回的速度可能會低。因此，基於DNA的儲存已被認為更適合於大規模檔案儲存，其涉及儲存介質的較少數量的讀取和寫入。進一步地，在該過程的各個階段(例如編碼、寫入、儲存、解碼、讀取、取回)可引入許多錯誤，從而危及資料流程的輸入和輸出。示例性錯誤包括在合成和測序期間引起的DNA片段突變、缺失(deletions)、插入、丟失以及長期儲存後的變性。此外，當使用DNA儲存大量資料時，實現對資料的一部分的隨機存取而不是全面地取回該資料可能是具有挑戰性的。 However, at present, DNA-based data storage has many limitations. For example, the production cost of DNA synthesis is quite high, and the speed of data retrieval may be low due to sequencing. Therefore, DNA-based storage has been considered more suitable for large-scale file storage, which involves a smaller number of reads and writes to storage media. Further, many errors can be introduced at various stages of the process (such as encoding, writing, storing, decoding, reading, and retrieving), thereby jeopardizing the input and output of the data flow. Exemplary errors include mutations, deletions, insertions, losses, and denaturation after long-term storage of DNA fragments during synthesis and sequencing. In addition, when using DNA to store large amounts of data, it may be challenging to achieve random access to a portion of the data instead of fully retrieving the data.

本發明關於實現可靠且有效的基於DNA的資料儲存和取回的技術。具體地，本發明提供在核酸(例如去氧核糖核酸(「DNA」))上儲存輸入資料的精確、有效且可靠的方法。特別地，本發明利用新型5比特轉碼框架將一個或更多個資料檔案轉換成核酸序列(例如DNA序列)。本發明進一步提供一種集成過程，其包括用於有效且可靠的資料儲存和取回的壓縮演算法、錯誤校正演算法和轉碼框架。此外，本發明允許亂數據存取，其在將大規模資料儲存在一起時特別有益，但是在給定時間僅需瀏覽部分資訊。可以根據本發明的方法儲存的資料包括可以以數字方式(即，以二進位資料的方式)表示的任何類型的資料，包括例如文字檔、高清晰度影片、圖像及/或音訊。 The invention relates to a technology for realizing reliable and effective DNA-based data storage and retrieval. Specifically, the present invention provides an accurate, effective, and reliable method for storing input data on nucleic acids (eg, deoxyribonucleic acid ("DNA")). In particular, the present invention utilizes a novel 5-bit transcoding framework to convert one or more data files into nucleic acid sequences (eg, DNA sequences). The present invention further provides an integration process that includes a compression algorithm, an error correction algorithm, and a transcoding framework for efficient and reliable data storage and retrieval. In addition, the present invention allows random data access, which is particularly beneficial when storing large-scale data together, but only needs to browse part of the information at a given time. The data that can be stored according to the method of the present invention includes any type of data that can be represented digitally (ie, in the form of binary data), including, for example, text files, high-definition movies, images, and/or audio.

在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法，該方法包括：a)將輸入資料轉換成核苷酸序列組，其中該轉換包括i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組；以及b)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: a) converting the input data into a set of nucleotide sequences, wherein the conversion includes i) a data processing step, including converting the input data The data is converted into a binary string; and ii) a nucleotide coding step, including converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the nucleotide sequence group .

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences.

在一些實施型態中，資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。 In some embodiments, the data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings.

在一些實施型態中，核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串。 In some embodiments, the nucleotide coding step includes converting each 5-bit binary string to an integer ranging from 0 to 31 to obtain an integer string.

在一些實施型態中，核苷酸編碼步驟進一步包括使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。 In some embodiments, the nucleotide encoding step further includes converting the integer string using a 5-bit transcoding framework to obtain a set of nucleotide sequences.

在一些實施型態中，核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。 In some embodiments, the nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences with a predetermined length.

在一些實施型態中，多個初始整數子序列中的每一個的長度基於所選擇的合成平台的寡聚物長度、所需的容錯度、輸入資料的大小、所選擇的錯誤校正碼或其組合確定。 In some embodiments, the length of each of the plurality of initial integer subsequences is based on the oligomer length of the selected synthesis platform, the required error tolerance, the size of the input data, the selected error correction code or The combination is determined.

在一些實施型態中，核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。 In some embodiments, the nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences.

在一些實施型態中，添加至多個初始整數子序列中的每一個的索引資訊包括整數序列，其中整數序列的長度基於輸入資料的大小。 In some implementations, the index information added to each of the plurality of initial integer sub-sequences includes an integer sequence, where the length of the integer sequence is based on the size of the input data.

在一些實施型態中，核苷酸編碼步驟包括，在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列。 In some embodiments, the nucleotide coding step includes, after adding the index information, adding redundant data to a plurality of indexed integer subsequences, thereby obtaining a plurality of redundant integer subsequences.

在一些實施型態中，添加冗餘數據至多個具有索引的整數子序列包括：創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中每一個中的整數個數；用多個具有索引的整數子序列及透過應用錯誤校正編碼生成的資料填充空矩陣；以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。 In some implementations, adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and The number of rows is greater than the number of integers in each of the multiple indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining based on the filled matrix Multiple integer subsequences with redundancy.

在一些實施型態中，空矩陣的列數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合來確定。 In some embodiments, the number of columns of the empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof determine.

在一些實施型態中，空矩陣的行數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合來確定。 In some embodiments, the number of rows of the empty matrix is based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof determine.

在一些實施型態中，錯誤校正編碼係里德-所羅門(「RS」)編碼(Reed-Solomon(「RS」)coding)。 In some implementations, the error correction coding is Reed-Solomon ("RS") coding.

在一些實施型態中，通過應用錯誤校正編碼生成的資料係透過應用RS編碼的串校正及/或RS編碼的塊校正來生成的。 In some embodiments, the data generated by applying the error correction code is generated by applying RS code string correction and/or RS code block correction.

在一些實施型態中，5比特轉碼框架以表2為依據。 In some implementations, the 5-bit transcoding framework is based on Table 2.

在一些實施型態中，R及Y的選擇是基於：1)與緊鄰R或Y前面的核苷酸不同；和/或2)核苷酸序列的估算GC含量。 In some embodiments, the selection of R and Y is based on: 1) a nucleotide different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the nucleotide sequence.

在一些實施型態中，輸入資料對應於壓縮檔。在一些實施型態中，輸入資料對應於兩個或更多個檔。 In some implementations, the input data corresponds to compressed files. In some implementations, the input data corresponds to two or more files.

在一些實施型態中，輸入資料對應於文字檔。 In some implementations, the input data corresponds to a text file.

在一些實施型態中，資料處理進一步包括壓縮輸入資料以獲得壓縮檔以及將壓縮檔轉換成二進位串。 In some implementations, the data processing further includes compressing the input data to obtain a compressed file and converting the compressed file into a binary string.

在一些實施型態中，壓縮檔是使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)進行壓縮的。 In some embodiments, the compressed file is compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中，資料處理步驟進一步包括：將兩個或更多個檔歸組為TAR檔。 In some embodiments, the data processing step further includes: grouping two or more files into TAR files.

在一些實施型態中，使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)將TAR檔進一步壓縮。 In some implementations, the TAR file is further compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中，核苷酸編碼步驟進一步包括將引物序列對附加至核苷酸序列組的每個核苷酸序列的5’及3’末端。 In some embodiments, the nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of each nucleotide sequence of the nucleotide sequence group.

在一些實施型態中，將引物對附接至合成核酸組。 In some embodiments, the primer pair is attached to the synthetic nucleic acid set.

在一些實施型態中，提供在核酸上儲存兩組或更多組輸入資料的方法，該方法包括：a)根據本發明所記載之任何一種方法，將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列；b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端，其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同；以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) converting two or more sets of input data separately according to any method described in the present invention Into two or more sets of corresponding nucleotide sequences; b) appending primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences respectively, wherein The primer pairs of two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively.

在一些實施型態中，其中每對引物具有不同於兩組或更多組相應的核苷酸序列或其互補序列中的任何一個的序列。 In some embodiments, each pair of primers has a sequence different from any one of two or more sets of corresponding nucleotide sequences or complementary sequences thereof.

在一些實施型態中，前述合成核酸組的GC含量範圍為30%至70%。在一些實施型態中，前述合成核酸組的GC含量範圍小於約70%。 In some embodiments, the GC content of the aforementioned synthetic nucleic acid group ranges from 30% to 70%. In some embodiments, the GC content of the aforementioned synthetic nucleic acid group ranges from less than about 70%.

在一些實施型態中，儲存前述合成核酸組。在一些實施型態中，前述合成核酸組透過乾燥儲存。在一些實施型態中，前述合成核酸組透過凍乾法儲存。 In some embodiments, the aforementioned synthetic nucleic acid set is stored. In some embodiments, the aforementioned synthetic nucleic acid set is stored dry. In some embodiments, the aforementioned synthetic nucleic acid set is stored by lyophilization.

在一些實施型態中，將前述合成核酸組固定在載體上。在一些實施型態中，前述載體是微陣列。 In some embodiments, the aforementioned synthetic nucleic acid set is fixed on a carrier. In some embodiments, the aforementioned carrier is a microarray.

在一些實施型態中，提供一種用於取回(retrieving)儲存在核酸上的輸出資料的方法，該方法包括：a)獲得核酸組的核苷酸序列組，b)將該核苷酸序列組轉換成輸出資料，其中前述轉換包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料，從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) the nucleotide sequence The group is converted into output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including the two The carry string is converted into output data to obtain the aforementioned output data.

在一些實施型態中，在取回輸出資料之前擴增核酸組。 In some embodiments, the nucleic acid set is amplified before the output data is retrieved.

在一些實施型態中，測序核酸組以生成多個序列讀數。 In some embodiments, the nucleic acid set is sequenced to generate multiple sequence reads.

在一些實施型態中，將多個序列讀數配對、合併及過濾以獲得前述核苷酸序列組。 In some embodiments, multiple sequence reads are paired, merged, and filtered to obtain the aforementioned set of nucleotide sequences.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data.

在一些實施型態中，核苷酸解碼步驟包括將核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。 In some embodiments, the nucleotide decoding step includes converting the set of nucleotide sequences into multiple integer subsequences including integers in the range of 0-31.

在一些實施型態中，核苷酸解碼步驟進一步包括將錯誤校正編碼應用於多個整數子序列，從而獲得多個具有索引的整數子序列。 In some embodiments, the nucleotide decoding step further includes applying an error correction code to multiple integer subsequences to obtain multiple indexed integer subsequences.

在一些實施型態中，應用錯誤校正編碼的步驟包括：i)將RS編碼串校正應用於多個整數子序列以獲得多個一致整數子序列；以及ii)將RS編碼塊校正應用於多個一致整數子序列以獲得多個具有索引的整數子序列。 In some embodiments, the step of applying error correction coding includes: i) applying RS coding string correction to multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to multiple Consistent integer subsequences to obtain multiple indexed integer subsequences.

在一些實施型態中，核苷酸解碼步驟進一步包括從多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。 In some embodiments, the nucleotide decoding step further includes removing the index from the plurality of indexed integer subsequences to obtain a plurality of core integer subsequences.

在一些實施型態中，核苷酸解碼步驟進一步包括將核心整數子序列合併為整數串。 In some embodiments, the nucleotide decoding step further includes combining core integer subsequences into integer strings.

在一些實施型態中，核苷酸解碼步驟進一步包括將整數串轉換成二進位串。 In some embodiments, the nucleotide decoding step further includes converting the integer string to a binary string.

在一些實施型態中，將輸出資料儲存在壓縮檔中。在一些實施型態中，資料處理步驟進一步包括解壓壓縮檔。在一些實施型態中，解壓通過LZMA演算法進行。 In some implementations, the output data is stored in a compressed file. In some embodiments, the data processing step further includes decompressing the compressed file. In some embodiments, the decompression is performed by LZMA algorithm.

在一些實施型態中，輸出資料對應於多個檔。在一些實施型態中，透過TAR演算法從輸出資料中提取前述多個檔。 In some implementations, the output data corresponds to multiple files. In some implementations, the aforementioned multiple files are extracted from the output data through the TAR algorithm.

在一些實施型態中，核酸組包括位於5’及3’末端的引物序列且前述方法包括在核苷酸解碼步驟之前去除引物序列。 In some embodiments, the nucleic acid set includes primer sequences at the 5'and 3'ends and the aforementioned method includes removing the primer sequence before the nucleotide decoding step.

在一些實施型態中，提供一種用於取回儲存在感興趣的核酸組上的輸出資料的方法，其中感興趣的核酸組是存在於混合物中的多組核苷酸序列中的一組，每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組，該方法包括：a)使用對應於感興趣的核酸的引物對擴增該核酸組；b)獲得擴增核酸的核苷酸序列組，c)根據如上述實施型態中的方法將該核苷酸序列組轉換成輸出資料；從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid group of interest is provided, wherein the nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, Each set encodes a different output data set and has different primer pair sets at the 3'and 5'ends. The method includes: a) amplifying the nucleic acid set using primer pairs corresponding to the nucleic acid of interest; b) obtained The nucleotide sequence group of the amplified nucleic acid, c) converting the nucleotide sequence group into output data according to the method in the above-mentioned embodiment mode; thereby obtaining the aforementioned output data.

在一些實施型態中，提供一種用於取回儲存在兩組或更多組感興趣的核酸上的相應的兩組或更多組輸出資料的方法，其中前述兩組或更多組感興趣的核酸是在存在於混合物中的多個核苷酸序列之中，每一組編碼不同的輸出資料組且具有位於3’及5’末端的不同的引物對組，該方法包括：a)使用對應於前述兩組或更多組感興趣的核酸的引物對擴增(例如，分別擴增或一起擴增)前述兩組或更多組感興趣的核酸；b)獲得擴增核酸的兩組或更多組核苷酸序列，c)根據本發明所記載之任一種方法將前述兩組或更多組核苷酸序列分別轉換成兩組或更多組輸出資料；從而獲得前述兩組或更多組輸出資料。 In some embodiments, a method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest is provided, wherein the aforementioned two or more sets of interest The nucleic acid is among multiple nucleotide sequences present in the mixture, each group encodes a different output data set and has different primer pair sets at the 3'and 5'ends. The method includes: a) using Primer pairs corresponding to the aforementioned two or more sets of nucleic acids of interest are amplified (eg, amplified separately or together) to the aforementioned two or more sets of nucleic acids of interest; b) two sets of amplified nucleic acids are obtained Or more sets of nucleotide sequences, c) according to any one of the methods described in the present invention, the two or more sets of nucleotide sequences are converted into two or more sets of output data; More sets of output data.

在一些實施型態中，提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質，前述一個或更多個程式包括指令，當其由電子裝置的一個或更多個處理器執行時，使該電子裝置實施如本發明所記載之任何一種方法。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device is implemented, any method as described in the present invention.

本發明進一步提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的系統，包括：一個或更多個處理器；記憶體；及一個或更多個程式，其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行，前述一個或多個程式包括用於實施如本發明所記載之任何一種方法的指令。 The present invention further provides a system for providing nucleic acid-based data storage or retrieving data from nucleic acids, including: one or more processors; memory; and one or more programs, wherein one or more of the foregoing The program is stored in the memory and is configured to be executed by the aforementioned one or more processors. The aforementioned one or more programs include instructions for implementing any one of the methods described in the present invention.

本發明進一步提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的電子裝置，該裝置包括用於實施如本發明所記載之任一種方法的設備。 The present invention further provides an electronic device for providing nucleic acid-based data storage or retrieving data from nucleic acids, the device comprising equipment for implementing any of the methods described in the present invention.

本發明提供在核酸(例如去氧核糖核酸(「DNA」))上儲存輸入資料的精確、有效且可靠的方法。具體地，本發明利用新型5比特轉碼框架將一個或多個資料檔案轉換成核酸序列(例如DNA序列)。此新型5比特轉碼框架允許進行有效地核酸序列設計，從而達到(strikes)正確的GC含量，避免某些均聚物(例如長度為4個或4個以上核苷酸的均聚物)，並降低核酸合成和擴增中的錯誤率。本發明進一步提供一種集成過程，其包括用於有效且可靠的資料儲存和取回的壓縮演算法、錯誤校正演算法和轉碼框架。本發明提供的方法可以用於儲存任何大小的資料，包括大尺寸檔。此外，本發明允許隨機資料存取，其在將大規模資料儲存在一起，但是在給定時間僅需要瀏覽部分資訊時特別有益。可以根據本發明的方法儲存的資料包括可以以數字方式(即，以二進位資料的方式)表示的任何類型的資料，包括例如文字檔、高清晰度影片、圖像及/或音訊。 The present invention provides an accurate, effective, and reliable method for storing input data on nucleic acids, such as deoxyribonucleic acid ("DNA"). Specifically, the present invention utilizes a novel 5-bit transcoding framework to convert one or more data files into nucleic acid sequences (eg, DNA sequences). This new 5-bit transcoding framework allows efficient nucleic acid sequence design to strike the correct GC content and avoid certain homopolymers (such as homopolymers with 4 or more nucleotides in length), And reduce the error rate in nucleic acid synthesis and amplification. The present invention further provides an integration process that includes a compression algorithm, an error correction algorithm, and a transcoding framework for efficient and reliable data storage and retrieval. The method provided by the present invention can be used to store data of any size, including large-size files. In addition, the present invention allows random data access, which is particularly beneficial when storing large-scale data together, but only needs to browse part of the information at a given time. The data that can be stored according to the method of the present invention includes any type of data that can be represented digitally (ie, in the form of binary data), including, for example, text files, high-definition movies, images, and/or audio.

【圖1】表示根據一些實施型態的用於提供基於DNA的資料儲存和取回的示例性過程。 [Figure 1] Represents an exemplary process for providing DNA-based data storage and retrieval according to some implementations.

【圖2】表示根據一些實施型態的用於處理用於基於DNA的資料儲存的壓縮檔的示例性手段。 [FIG. 2] Represents exemplary means for processing compressed files for DNA-based data storage according to some implementations.

【圖3A】表示根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3A] Represents exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3B】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3B] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3C】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3C] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖3D】描繪根據一些實施型態的用於添加索引和冗餘數據至待儲存的數字內容的示例性步驟。 [FIG. 3D] Depicts exemplary steps for adding indexes and redundant data to digital content to be stored according to some implementations.

【圖4】表示根據一些實施型態的用於處理用於基於DNA的資料儲存的壓縮檔的示例性手段。 [FIG. 4] Represents exemplary means for processing compressed files for DNA-based data storage according to some implementation types.

【圖5】表示根據一些實施型態的示例性5比特轉碼框架。 [Figure 5] Represents an exemplary 5-bit transcoding framework according to some implementations.

【圖6】表示根據一些實施型態的待儲存和取回的示例性文本部分。 [FIG. 6] Represents an exemplary text portion to be stored and retrieved according to some implementations.

【圖7】表示根據一些實施型態的基於DNA的資料儲存和取回技術的示例性實現。 [FIG. 7] Represents an exemplary implementation of DNA-based data storage and retrieval technology according to some implementation types.

【圖8】描繪根據一些實施型態的示例性電子裝置。 [FIG. 8] Depicts an exemplary electronic device according to some implementations.

【圖9A】表示根據一些實施型態的用於提供基於DNA的資料儲存的示例性過程。 [FIG. 9A] Represents an exemplary process for providing DNA-based data storage according to some implementations.

【圖9B】表示根據一些實施型態的用於提供基於DNA的資料取回的示例性過程。 [FIG. 9B] Represents an exemplary process for providing DNA-based data retrieval according to some implementation types.

因此，本發明在一方面提供一種用於將輸入資料儲存在核酸組上的方法，以及將輸入資料轉換成核苷酸序列組的方法。在另一方面，提供一種用於取回儲存在核酸上的輸出資料的方法，以及將核苷酸序列組轉換成輸出資料的方法。進一步提供一種用於儲存一個或更多個程式的系統和非暫時性電腦可讀儲存介質用於實施本發明所記載之方法的任何一個或更多個步驟。 Therefore, in one aspect, the present invention provides a method for storing input data on a nucleic acid group and a method for converting the input data into a nucleotide sequence group. In another aspect, a method for retrieving output data stored on a nucleic acid and a method for converting a set of nucleotide sequences into output data are provided. Further provided is a system for storing one or more programs and a non-transitory computer-readable storage medium for implementing any one or more steps of the methods described in the present invention.

應理解，本發明描述的本發明的實施型態包括「由實施型態組成」及/或「基本上由實施型態組成」。 It should be understood that the embodiments of the present invention described in the present invention include "consisting of implementation forms" and/or "essentially consisting of implementation forms".

本發明提及的「約」某值或參數包括(和描述)針對該值或參數本身的變化。例如，涉及「約X」的描述包括「X」的描述。 The reference to "about" a value or parameter in the present invention includes (and describes) a change to the value or parameter itself. For example, a description related to "about X" includes a description of "X".

如本發明所用，提及的「非」某值或參數通常表示並描述「除了」該值或參數。例如，該方法不用於治療X型癌症，意味著該方法用於治療除X以外的其他類型的癌症。 As used in the present invention, the reference to a value or parameter that is "not" generally means and describes "except" the value or parameter. For example, this method is not used to treat type X cancer, meaning that the method is used to treat other types of cancer than X.

如本發明和所附的申請專利範圍所用，單數形式包括複數指示物，除非上下文另有明確規定。 As used in the present invention and the appended patent application, the singular form includes plural indicators unless the context clearly dictates otherwise.

如本發明和所附的申請專利範圍所用的，「一組」是指一個或多個指示物，除非上下文另有明確規定。核酸組可以是編碼同一檔或一起壓縮的同一組檔的資料的核酸。在一些實施型態中，在同一檔中的核酸可具有相同的附加至5’和3’末端的引物組。 As used in the present invention and the appended patent applications, "a group" refers to one or more indicators unless the context clearly dictates otherwise. The nucleic acid group may be a nucleic acid encoding data of the same file or the same file compressed together. In some embodiments, the nucleic acids in the same file may have the same primer sets appended to the 5'and 3'ends.

編碼資料和資料儲存的方法 Encoding data and data storage method

本發明在一方面提供一種用於將輸入資料轉換成核苷酸序列組的方法(例如電腦實現方法)。該方法通常包括資料處理步驟，其將輸入資料轉換成二進位串，和核苷酸編碼步驟，其使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組。該方法可用於將輸入資料儲存在核酸組上，其涉及首先將輸入資料轉換成核苷酸序列組，然後合成包括前述核苷酸序列組的核酸組。 In one aspect, the present invention provides a method (for example, a computer-implemented method) for converting input data into a set of nucleotide sequences. The method generally includes a data processing step, which converts the input data into a binary string, and a nucleotide encoding step, which uses a 5-bit transcoding framework to convert the aforementioned binary string to obtain a set of nucleotide sequences. This method can be used to store input data on a nucleic acid group, which involves first converting the input data into a nucleotide sequence group, and then synthesizing the nucleic acid group including the aforementioned nucleotide sequence group.

輸入資料可以表示任何數量的任何類型的檔，例如文字檔、影像檔、影片/音訊檔(例如高清晰度檔)等。該檔可以是非壓縮的或壓縮的。當檔是非壓縮的，可將其在轉換成二進位串之前先進行壓縮。例如，可使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Ziv-Markov Chain algorithm)將檔案壓縮為LZMA檔(例如A.lzma)。在一些實施型態中，首先將兩個或更多個檔(例如三個、四個、五個、六個和更多個檔)歸組到一起，例如TAR檔(例如A.tar)，且將該TAR檔進一步壓縮為LZMA檔(例如A.tar.lzma)。如此，該方法可以允許多個檔(例如1-5、5-10、10-15、15-25、25-35、35-50)儲存在單個核酸組合物中。 The input data can represent any number of files of any type, such as text files, image files, video/audio files (such as high-definition files), etc. The file can be uncompressed or compressed. When the file is uncompressed, it can be compressed before being converted into a binary string. For example, the Lempel-Ziv-Markov Chain algorithm (Lempel-Ziv-Markov Chain algorithm) can be used to compress the files into LZMA files (eg A.lzma). In some embodiments, two or more files (such as three, four, five, six, and more files) are first grouped together, such as a TAR file (such as A.tar), And the TAR file is further compressed into an LZMA file (for example, A.tar.lzma). As such, the method can allow multiple files (eg, 1-5, 5-10, 10-15, 15-25, 25-35, 35-50) to be stored in a single nucleic acid composition.

在一些實施型態中，允許針對單個檔中的位置的隨機存取，該單個檔可被分為多組資料，且將多組資料各自進行如下所述的壓縮和處理。例如，對應於具有10個章節的書的數字化檔可被分為10個檔，每個檔對應於單個章節。然後將該十個檔案壓縮和處理以實現任何一個章節的自由存取。 In some implementations, random access to positions in a single file is allowed, the single file can be divided into multiple sets of data, and the multiple sets of data are each compressed and processed as described below. For example, a digitized file corresponding to a book with 10 chapters can be divided into 10 files, each file corresponding to a single chapter. Then compress and process the ten files to achieve free access to any chapter.

資料處理步驟將輸入資料轉換成二進位串。該二進位串可例如通過遵循本發明所記載之5比特轉碼框架直接轉換成核苷酸序列組。作為選擇，該二進位串可被進一步轉換成整數串，然後其例如通過遵循5比特轉碼框架轉換成核苷酸序列組。在一些實施型態中，進一步對該整數串進行錯誤校正編碼和/或其他處理，以生成具有冗餘的多個整數子序列，然後具有冗餘的多個整數子序列例如通過遵循5比特轉碼框架轉換成核苷酸序列組。 The data processing step converts the input data into a binary string. The binary string can be directly converted into a set of nucleotide sequences, for example, by following the 5-bit transcoding framework described in the present invention. Alternatively, the binary string can be further converted into an integer string, which is then converted into a set of nucleotide sequences, for example, by following a 5-bit transcoding framework. In some embodiments, the integer string is further subjected to error correction coding and/or other processing to generate multiple integer subsequences with redundancy, and then multiple integer subsequences with redundancy, for example, by following the 5-bit conversion The code frame is converted into a set of nucleotide sequences.

因此，例如，在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的方法(例如電腦實現方法)，其中該轉換包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組。在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法，該方法包括：a)將輸入資料轉換成核苷酸序列組，其中該轉換包括i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換前述二進位串以獲得核苷酸序列組；以及b)合成包括前述核苷酸序列組的核酸組。 Therefore, for example, in some embodiments, a method (for example, a computer-implemented method) for converting input data into a set of nucleotide sequences is provided, wherein the conversion includes: i) a data processing step, including converting the input data Into a binary string; and ii) a nucleotide encoding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: a) converting the input data into a set of nucleotide sequences, wherein the conversion includes i) a data processing step, including converting the input data The data is converted into a binary string; and ii) a nucleotide coding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the aforementioned nucleotide sequence group .

在一些實施型態中，資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列，其中的每一個可被進一步轉換成整數0至31範圍內的整數以獲得整數串。該整數串可例如使用5比特轉碼框架直接轉換成核苷酸序列組。作為選擇，對該整數串進行如下所述的進一步操作。 In some implementations, the data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings, each of which can be further converted into integers ranging from integers 0 to 31 to obtain integer strings. This integer string can be directly converted into a set of nucleotide sequences, for example, using a 5-bit transcoding framework. Alternatively, the integer string is further operated as described below.

具體地，整數串可分成多個具有預定長度的初始整數子序列。該初始整數子序列的預定長度基於多個因素計算，前述因素包括合成平台的寡聚物長度、所選擇的錯誤校正碼、所需的容錯度、寡聚物的合成錯誤率和/或總的編碼資料大小，如下文的詳細討論。例如，該整數串可使用長度固定(例如22個整數)的滑動視窗分成一列非重疊整數子序列。然後可添加索引至多個初始整數子序列中的每一個以生成多個具有索引的整數子序列。該索引可含有一些同樣在0至31範圍內的整數。該索引的長度是靈活的且取決於DNA合成的產量和資料大小。 Specifically, the integer string may be divided into a plurality of initial integer sub-sequences having a predetermined length. The predetermined length of the initial integer subsequence is calculated based on a number of factors including the length of the oligomer of the synthesis platform, the selected error correction code, the required error tolerance, the synthesis error rate of the oligomer and/or the total The encoding data size is discussed in detail below. For example, the integer string can be divided into a column of non-overlapping integer subsequences using a sliding window of fixed length (eg, 22 integers). An index can then be added to each of the multiple initial integer subsequences to generate multiple indexed integer subsequences. The index may contain integers that are also in the range 0 to 31. The length of the index is flexible and depends on the yield of DNA synthesis and the size of the data.

在一些實施型態中，添加冗餘數據以生成多個具有冗餘的整數子序列。例如，將里德-所羅門(Reed-Solomon，RS)錯誤校正編碼應用於多個整數子序列以通過RS編碼的串校正和塊校正生成一新列的具有冗餘的整數子序列。冗余是指過量的合成寡聚物以提供對丟失的魯棒性 (robustness to dropout)。串校正中的冗餘有助於寡聚物的轉換和顛換(transitions and transversions)的錯誤校正。塊矯正中的冗餘能夠實現資訊的插入、缺失和完全丟失的校正。 In some embodiments, redundant data is added to generate multiple integer subsequences with redundancy. For example, Reed-Solomon (RS) error correction coding is applied to multiple integer subsequences to generate a new column of integer subsequences with redundancy through RS coded string correction and block correction. Redundancy refers to excess synthetic oligomers to provide robustness to dropout. Redundancy in string correction helps error correction of oligomer transitions and transversions. Redundancy in block correction can realize the correction of information insertion, deletion and complete loss.

在一個示例性的實施型態中，添加冗餘數據至多個具有索引的整數子序列包括：創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中每一個中的整數個數；用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣；以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。空矩陣的列數和/或行數可以基於錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。錯誤校正編碼是里德-所羅門(「RS」)編碼(Reed-Solomon(「RS」)coding)。在一些實施型態中，透過應用錯誤校正編碼生成的資料是透過應用RS編碼的串校正和RS編碼的塊校正生成的。 In an exemplary implementation form, adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and wherein the empty The number of rows in the matrix is greater than the number of integers in each of multiple indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and based on the filled Obtains multiple integer subsequences with redundancy. The number of columns and/or rows of the empty matrix may be determined based on the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof. The error correction coding is Reed-Solomon ("RS") coding. In some embodiments, the data generated by applying the error correction code is generated by applying RS code string correction and RS code block correction.

在一些實施型態中，核苷酸編碼步驟進一步包括將引物序列對附加至核苷酸序列組的5’和3’末端。前述引物可用於例如通過PCR擴增方法擴增核酸組。在一些實施型態中，將引物序列在合成之前添加至核苷酸序列組。作為選擇，可將引物例如透過連接附接至合成核酸。 In some embodiments, the nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of the nucleotide sequence set. The aforementioned primers can be used, for example, to amplify nucleic acid groups by PCR amplification methods. In some embodiments, the primer sequence is added to the set of nucleotide sequences before synthesis. Alternatively, the primer can be attached to the synthetic nucleic acid, for example, by ligation.

前述方法可用於將兩組或更多組輸入資料儲存在核酸上。具體地，該方法包括a)將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列；b)分別將引物序列對附加至前述兩組或更多組相應的核苷酸序列中的每一個的5’和3’末端，其中前述兩組或更多組相應的核苷酸序列中的每一個的引物彼此不同，以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的多組核酸。每個引物對可具有與兩個或更多個相應的核苷酸序列或其互補序列中的任何一個不相同的序列。 The aforementioned method can be used to store two or more sets of input data on a nucleic acid. Specifically, the method includes a) converting two or more sets of input data into two or more sets of corresponding nucleotide sequences; b) appending primer sequence pairs to the aforementioned two or more sets respectively The 5'and 3'ends of each of the nucleotide sequences of the above, wherein the primers of each of the aforementioned two or more corresponding sets of nucleotide sequences are different from each other, and c) the synthesis includes the aforementioned two or Multiple sets of nucleic acids with corresponding sets of nucleotide sequences. Each primer pair may have a sequence that is different from any one of two or more corresponding nucleotide sequences or complementary sequences thereof.

合成核酸可具有約30%至約70%的GC含量。例如，合成核酸可具有約40%至約60%、約30%至約40%、約40%至約50%、約50%至約60%或約60%至約70%中的任何一個的GC含量。在一些實施型態中，合成核酸不具有長於3個核苷酸的均聚物(例如，不具有4、5、6、7、8、9或10個核苷酸的均聚物)。在一些實施型態中的核酸是寡核苷酸，例如約50、150、200、300或400個中任一核苷酸長度的寡核苷酸。在一些實施型態中，核酸組包括約1、2、3、5、10、15或更多個中任意個數的寡核苷酸。 The synthetic nucleic acid may have a GC content of about 30% to about 70%. For example, the synthetic nucleic acid may have any of about 40% to about 60%, about 30% to about 40%, about 40% to about 50%, about 50% to about 60%, or about 60% to about 70% GC content. In some embodiments, the synthetic nucleic acid does not have a homopolymer longer than 3 nucleotides (eg, does not have a homopolymer of 4, 5, 6, 7, 8, 9, or 10 nucleotides). In some embodiments, the nucleic acid is an oligonucleotide, for example, an oligonucleotide of about 50, 150, 200, 300, or 400 nucleotides in length. In some embodiments, the nucleic acid set includes about 1, 2, 3, 5, 10, 15 or more oligonucleotides in any number.

在一些實施型態中，前述方法進一步包括儲存合成核酸組。在一些實施型態中，將核酸組通過乾燥，例如凍乾法儲存。核酸組可作為乾燥組合物，包括凍乾組合物儲存。在一些實施型態中，將核酸組固定在載體上，包括如微陣列的固體載體。在一些實施型態中，將核酸儲存在具有密度為每1英寸×3英寸面積上約5μg的微陣列上(例如，在CustomArray 12K晶片中)。在一些實施型態中，輸入資料的大小為至少約50MB。 In some embodiments, the aforementioned method further includes storing the synthetic nucleic acid set. In some embodiments, the nucleic acid set is stored by drying, for example, by lyophilization. The nucleic acid set can be stored as a dry composition, including a lyophilized composition. In some embodiments, the nucleic acid set is immobilized on a support, including a solid support such as a microarray. In some embodiments, nucleic acids are stored on a microarray with a density of about 5 μg per 1 inch×3 inch area (eg, in a CustomArray 12K wafer). In some implementations, the size of the input data is at least about 50MB.

解碼核酸序列和資料取回的方法 Decoding nucleic acid sequence and data retrieval method

本發明在另一方面提供一種用於將核苷酸序列組轉換成輸出資料的方法(例如電腦實現方法)。該方法幾乎是編碼程式的相反的過程，且通常包括核苷酸解碼步驟，其將核苷酸序列組例如通過使用5比特轉碼框架轉換成二進位串，以及資料處理步驟，其將二進位串轉換成輸出資料。該方法可用於取回儲存在核酸組上的輸出資料，其涉及獲得該核酸組的核苷酸序列，以及然後將該核苷酸序列組轉換成輸出資料。 In another aspect, the present invention provides a method (for example, a computer-implemented method) for converting a set of nucleotide sequences into output data. This method is almost the opposite process of the coding program, and usually includes a nucleotide decoding step, which converts a nucleotide sequence group into a binary string, for example, by using a 5-bit transcoding framework, and a data processing step, which converts the binary Serial conversion into output data. The method can be used to retrieve output data stored on a nucleic acid group, which involves obtaining the nucleotide sequence of the nucleic acid group, and then converting the nucleotide sequence group into output data.

在一些實施型態中，首先擴增前述核酸組，例如透過使用存在於該核酸組3’和5’末端的引物。且該擴增核酸可進行測序，例如下一代測序。下一代測序技術是所屬技術領域之通常知識者所知悉的。例如，核酸通過使用依諾米那(Illumina)測序方法測序。屬於特定檔的序列可通過比對引物序列獲得。在一些實施型態中，該方法包括NGS文庫製備。當前述核酸組存在於包括編碼不同資料的不同核酸組的混合物中，感興趣的核酸組可通過使用該感興趣的核酸組的獨特的引物對特異性擴增，從而允許對應於該感興趣的核酸組的資料的隨機存取。如果需要在下一代測序的單次運行中讀取和解碼幾個壓縮檔，則通過PCR擴增它們所有相應的核酸組，並且將使用所有相應的對。 In some embodiments, the aforementioned nucleic acid group is first amplified, for example, by using primers present at the 3'and 5'ends of the nucleic acid group. And the amplified nucleic acid can be sequenced, such as next-generation sequencing. Next-generation sequencing technology is known to those of ordinary skill in the technical field. For example, nucleic acids are sequenced using the Illumina sequencing method. Sequences belonging to a specific file can be obtained by aligning primer sequences. In some embodiments, the method includes NGS library preparation. When the aforementioned nucleic acid group is present in a mixture including different nucleic acid groups encoding different materials, the nucleic acid group of interest can be specifically amplified by using the unique primer pair of the nucleic acid group of interest, thereby allowing the corresponding Random access to nucleic acid group data. If it is necessary to read and decode several compressed files in a single run of next-generation sequencing, then all their corresponding nucleic acid groups will be amplified by PCR and all corresponding pairs will be used.

在一些實施型態中，該方法包括雙端(pair-end)下一代測序以及讀數配對和合併，其中來自單個簇的正向和反向讀數將被配對併合並成單個讀數，且所有具有不規則長度的新讀數將被過濾。而且，根據引物序列，可將所有讀數針對各自壓縮的檔分組。然後可去除該引物，且可將核苷酸序列轉換成多個包括0-31範圍內的整數的整數子序列，或直接轉換成二進位串，前述二進位串隨後被轉換成輸出資料。 In some embodiments, the method includes pair-end next-generation sequencing and read pairing and merging, where forward and reverse reads from a single cluster will be paired and merged into a single read, and all have New readings of regular length will be filtered. Furthermore, based on the primer sequence, all readings can be grouped for their respective compressed files. The primer can then be removed, and the nucleotide sequence can be converted into multiple integer subsequences including integers in the range of 0-31, or directly into a binary string, which is then converted into output data.

在一些實施型態中，前述方法進一步包括應用多個整數子序列的錯誤校正以獲得多個具有索引的整數子序列。在一個示例性實施型態中，應用錯誤校正編碼的步驟包括：i)將RS編碼串校正應用於多個整數子序列以獲得多個一致整數子序列；以及ii)將RS編碼塊校正應用於前述多個一致整數子序列以獲得多個具有索引的整數子序列。因為一種核酸在合成過程中可具有許多分子拷貝且被多次測序，所以許多讀數可能代表一個核酸。由於在高通量合成和測序的期間引起的錯誤，這些讀數可能發生變化，但與原始設計的核酸完全匹配的正確讀數仍然具有計數優勢。通過在整數串的每個位置處的基於最高頻率的校正，可校正共用相同索引的所有整數串並將其合併到串校正和塊校正之間的一致整數串中。 In some embodiments, the aforementioned method further includes applying error correction of multiple integer subsequences to obtain multiple indexed integer subsequences. In an exemplary embodiment, the step of applying error correction coding includes: i) applying RS coding string correction to multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to The foregoing multiple uniform integer subsequences obtain multiple indexed integer subsequences. Because a nucleic acid can have many molecular copies and be sequenced multiple times during synthesis, many reads may represent a nucleic acid. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings that exactly match the originally designed nucleic acid still have a counting advantage. With the highest frequency-based correction at each position of the integer string, all integer strings sharing the same index can be corrected and merged into a consistent integer string between string correction and block correction.

然後來自多個具有索引的整數子序列的索引可被去除以獲得多個核心整數子序列。然後該整數串可被連接成完整的整數串然後被轉換成二進位串。隨後該二進位串可被寫入檔，例如壓縮檔。隨後可將該壓縮檔例如通過使用LZMA演算法解壓。如果該解壓檔包括對應於多個檔的資料，將該解壓檔通過TAR演算法進一步處理(例如提取)以獲得前述多個檔。 Then indexes from multiple indexed integer subsequences can be removed to obtain multiple core integer subsequences. The integer string can then be concatenated into a complete integer string and then converted into a binary string. The binary string can then be written to a file, such as a compressed file. The compressed file can then be decompressed, for example, by using the LZMA algorithm. If the decompressed file includes data corresponding to multiple files, the decompressed file is further processed (eg, extracted) by the TAR algorithm to obtain the aforementioned multiple files.

在一些實施型態中，前述方法可用於取回儲存在感興趣的核酸組上的輸出資料，其中感興趣的核酸組是存在於混合物中的多組核苷酸序列中的一組，每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組。該方法包括a)使用對應於感興趣的核酸組的引物對擴增前述核酸組；b)獲得擴增核酸組的核苷酸序列組，c)以及根據上述實施型態中的方法將該核苷酸序列組轉換成輸出資料；從而獲得前述輸出資料。 In some embodiments, the aforementioned method can be used to retrieve the output data stored on the nucleic acid group of interest, where the nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, each The sets encode different sets of output data and have different sets of primer pairs at the 3'and 5'ends. The method includes a) amplifying the aforementioned nucleic acid group using primer pairs corresponding to the nucleic acid group of interest; b) obtaining the nucleotide sequence group of the amplified nucleic acid group, c) and applying the core according to the method in the above-described embodiment mode The nucleotide sequence group is converted into output data; thereby obtaining the aforementioned output data.

在一些實施型態中，提供用於取回儲存在兩組或更多組感興趣的核酸上的相應的兩組或更多組輸出資料的方法，其中前述感興趣的核酸組在存在於混合物中的多個核苷酸序列組之中，每一組編碼不同的輸出資料組且具有位於3’和5’末端的不同的引物對組，該方法包括：a)使用對應於前述兩組或更多組感興趣的核酸的引物對擴增(例如，分別擴增或一起擴增)前述兩組或更多組感興趣的核酸；b)獲得前述兩組或更多組擴增核酸的兩組或更多組核苷酸序列，以及c)將前述兩組或更多組核苷酸序列分別轉換成兩組或更多組輸出資料；從而獲得前述兩組或更多組輸出資料。 In some embodiments, a method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest is provided, wherein the aforementioned set of nucleic acids of interest is present in the mixture Among the multiple nucleotide sequence groups in each, each group encodes a different output data group and has different primer pair groups located at the 3'and 5'ends. The method includes: a) using corresponding to the aforementioned two groups or Primer pairs of more sets of nucleic acids of interest are amplified (eg, amplified separately or together) to the aforementioned two or more sets of nucleic acids of interest; b) two of the aforementioned two or more sets of amplified nucleic acids are obtained One or more sets of nucleotide sequences, and c) converting the aforementioned two or more sets of nucleotide sequences into two or more sets of output data, respectively; thereby obtaining the aforementioned two or more sets of output data.

5比特轉碼框架 5-bit transcoding framework

本發明的方法利用新型5比特轉碼框架用於將二進位串或整數串轉換成核苷酸序列組。「5比特轉碼框架」是指根據下文的表1的轉換。通常，來自二進位串的每5個連續的比特(bits)可以表示為0至31之間的整數以及之後的3個核苷酸(即3聚體)。例如，核酸具有四個鹼基(例如A、T、G和C)，因此二聚體(即NN)應該具有16種(例如AA、AT、AG、AC、TA、TT、TG、TC、GA、GT、GG、GC、CA、CT、CG和CC)。假設簡併鹼基R和Y在二聚體之後連接，三聚體(NNR/NNY)應該由32種組成，其也與0至31範圍內的32個整數很好地匹配並使二進位串轉換成DNA序列。 The method of the present invention utilizes a novel 5-bit transcoding framework for converting binary strings or integer strings into groups of nucleotide sequences. "5-bit transcoding framework" refers to the conversion according to Table 1 below. In general, every 5 consecutive bits from a binary string can be expressed as an integer between 0 and 31 and the following 3 nucleotides (ie, 3-mers). For example, nucleic acids have four bases (eg A, T, G and C), so the dimer (ie NN) should have 16 types (eg AA, AT, AG, AC, TA, TT, TG, TC, GA , GT, GG, GC, CA, CT, CG and CC). Assuming that the degenerate bases R and Y are connected after the dimer, the trimer (NNR/NNY) should consist of 32 species, which also matches well with the 32 integers in the range of 0 to 31 and makes the binary string Converted to DNA sequence.

在一些實施型態中，R選自A、T、G和C中的任意兩個，而Y選自A、T、G和C中相應的另外兩個。在一些實施型態中，R選自A和G，而Y選自T和C。在一些實施型態中，R選自A和C，而Y選自T和G。在一些實施型態中，R選自T和G，而Y選自A和C。在一些實施型態中，R選自T和C，而Y選自A和G。 In some embodiments, R is selected from any two of A, T, G, and C, and Y is selected from the other two of A, T, G, and C. In some embodiments, R is selected from A and G, and Y is selected from T and C. In some embodiments, R is selected from A and C, and Y is selected from T and G. In some embodiments, R is selected from T and G, and Y is selected from A and C. In some embodiments, R is selected from T and C, and Y is selected from A and G.

例如為了保持所需的GC含量和/或避免均聚物的目的，對應於R和Y的核苷酸的選擇可取決於它們前面鹼基。例如，在一個方案中R 選自A和G且Y選自C和T，是否選擇A或G為R以及是否選擇C或T為Y取決於它們前面的鹼基(即三聚體的第二鹼基)。在一些實施型態中，選擇R和Y以使第二和第三鹼基不相同。在一些實施型態中，選擇R和Y以維持所需的GC平衡。只要遵循規則，R和Y可以隨機選擇。此轉碼框架的編碼潛力是1.67(即針對3nt為5比特)。 For example, for the purpose of maintaining the desired GC content and/or avoiding homopolymers, the choice of nucleotides corresponding to R and Y may depend on the base in front of them. For example, in one scheme R is selected from A and G and Y is selected from C and T, whether A or G is selected as R and whether C or T is selected as Y depends on the base in front of them (ie the second of the trimer Bases). In some embodiments, R and Y are selected so that the second and third bases are not the same. In some embodiments, R and Y are selected to maintain the desired GC balance. As long as the rules are followed, R and Y can be chosen randomly. The coding potential of this transcoding framework is 1.67 (ie 5 bits for 3nt).

表2提供了示例性5比特轉碼框架，在表2所描繪的具體的實施例中，當Y要選自C和Y時，則R選自A和G。會理解，可以使用遵循相同原理的其他轉碼框架。 Table 2 provides an exemplary 5-bit transcoding framework. In the specific embodiment depicted in Table 2, when Y is to be selected from C and Y, then R is selected from A and G. It will be understood that other transcoding frameworks that follow the same principle can be used.

表2

Table 2

核酸的合成與儲存 Synthesis and storage of nucleic acids

包括所需的核苷酸序列的核酸可使用任何核酸合成方法合成。在一些實施型態中，該核酸通過化學合成法合成。高通量核酸合成的方法描述在Maurer等人於2002年2月17日提交的名稱為「COMBINATORIAL SYNTHESIS ON ARRAYS」的國際申請No.WO 2002US40580中，其公佈號為WO 03052383，於2016年12月以「ELECTROCHEMICALLY GENERATED ACID AND ITS CONTAINMENT TO 100 MICRON REACTION AREAS FOR THE PRODUCTION OF DNA MICROARRAYS」的名稱公開，其通過引用以其全部併入本發明。 Nucleic acids including the desired nucleotide sequence can be synthesized using any nucleic acid synthesis method. In some embodiments, the nucleic acid is synthesized by chemical synthesis. The method of high-throughput nucleic acid synthesis is described in International Application No. WO 2002US40580 entitled "COMBINATORIAL SYNTHESIS ON ARRAYS" filed by Maurer et al. on February 17, 2002, and its publication number is WO 03052383 in December 2016. It is disclosed as "ELECTROCHEMICALLY GENERATED ACID AND ITS CONTAINMENT TO 100 MICRON REACTION AREAS FOR THE PRODUCTION OF DNA MICROARRAYS", which is incorporated by reference in its entirety.

核酸一旦合成可儲存在不同的介質中。在一些實施型態中，將核酸乾燥(例如凍乾)且儲存在小瓶中。在一些實施型態中，將核酸固定在載體上，例如，諸如微陣列的固體載體。 Once synthesized, nucleic acids can be stored in different media. In some embodiments, nucleic acids are dried (eg, lyophilized) and stored in vials. In some embodiments, the nucleic acid is immobilized on a support, for example, a solid support such as a microarray.

電腦可讀儲存介質和系統 Computer readable storage medium and system

本發明進一步提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質，前述一個或更多個程式包括指令，當其由電子裝置的一個或更多個處理器執行時，使該電子裝置實施如本發明所記載之任何一種方法的一個或更多個步驟。 The present invention further provides a non-transitory computer-readable storage medium that stores one or more programs. The aforementioned one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the The electronic device implements one or more steps of any method described in the present invention.

在一些實施型態中，提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的系統，該系統包括：一個或更多個處理器；記憶體；和一個或更多個程式，其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行，前述一個或更多個程式包括用於實施如本發明所記載之任何一種方法的一個或更多個步驟的指令。 In some embodiments, a system for providing nucleic acid-based data storage or retrieving data from a nucleic acid is provided, the system includes: one or more processors; memory; and one or more programs, Wherein the aforementioned one or more programs are stored in the memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs include one or more for implementing any of the methods described in the present invention Multi-step instructions.

在一些實施型態中，提供一種用於提供基於核酸的資料儲存或從核酸中取回資料的電子裝置，該裝置包括實施如本發明所記載之任何一種方法的設備。 In some embodiments, an electronic device for providing nucleic acid-based data storage or retrieving data from nucleic acids is provided. The device includes a device that implements any of the methods described in the present invention.

示例性實施型態 Exemplary implementation

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group.

在一些實施型態中，提供將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting the input data into a binary string; and ii) nucleotide encoding The steps include converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length.

在一些實施型態中，提供一種將輸入資料轉換成核苷酸序列組的電腦實現方法：該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided: the method includes: i) data processing steps, including converting the input data into a binary string; and ii) nucleotides The coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列。添加冗餘數據至多個具有索引的整數子序列包括：創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數；用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣；以及基於被填充的矩陣獲得多個具有冗餘的(整數)子序列。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy. Adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and where the number of rows in the empty matrix is greater than that in multiple The number of integers in each of the indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining multiple redundancy based on the filled matrix (Integer) subsequence.

在一些實施型態中，空矩陣的列數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。 In some embodiments, the number of empty matrix columns is determined based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof .

在一些實施型態中，空矩陣的行數基於所選擇的合成平台的寡聚物長度、錯誤校正碼的類型、預定的容錯度值、多個具有索引的整數子序列的大小或其組合確定。 In some embodiments, the number of rows of the empty matrix is determined based on the oligomer length of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the size of multiple indexed integer subsequences, or a combination thereof .

在一些實施型態中，錯誤校正編碼是里德-所羅門(「RS」)編碼。 In some implementations, the error correction code is Reed-Solomon ("RS") code.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。資料處理步驟包括將二進位串分成非重疊的5比特二進位串的序列。核苷酸編碼步驟包括將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串，以及使用5比特轉碼框架轉換該整數串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將該整數串分成多個具有預定長度的初始整數子序列。核苷酸編碼步驟進一步包括添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列。核苷酸編碼步驟進一步包括在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列。添加冗餘數據至多個具有索引的整數子序列包括：創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數；用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣；以及基於被填充的矩陣獲得多個具有冗餘的整數子序列。通過應用錯誤校正編碼生成的資料是通過應用RS編碼的串校正和/或RS編碼的塊校正來生成的。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The data processing step includes dividing the binary string into a sequence of non-overlapping 5-bit binary strings. The nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string, and converting the integer string using a 5-bit transcoding framework to obtain a nucleotide sequence group. The nucleotide coding step further includes dividing the integer string into a plurality of initial integer subsequences having a predetermined length. The nucleotide coding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences. The nucleotide coding step further includes adding redundant data to multiple integer subsequences with indexes after adding index information, thereby obtaining multiple integer subsequences with redundancy. Adding redundant data to multiple indexed integer subsequences includes: creating an empty matrix, where the number of columns in the empty matrix is greater than the size of multiple indexed integer subsequences, and where the number of rows in the empty matrix is greater than that in multiple The number of integers in each of the indexed integer subsequences; filling the empty matrix with multiple indexed integer subsequences and data generated by applying error correction coding; and obtaining multiple redundancy based on the filled matrix Integer subsequence. The material generated by applying error correction coding is generated by applying RS coded string correction and/or RS coded block correction.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)將輸入資料轉換成二進位串；ii)將二進位串分成非重疊的5比特二進位串的序列；iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串；iv)將該整數串分成多個具有預定長度的初始整數子序列；v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列；vi)在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列，從而獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping The sequence of the 5-bit binary string; iii) convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and convert the integer string using a 5-bit transcoding framework; iv) convert the integer string Divided into multiple initial integer subsequences with a predetermined length; v) Add index information to each of the multiple initial integer subsequences to obtain multiple indexed integer subsequences; vi) After adding index information, add redundant data To multiple integer subsequences with indexes, to obtain multiple integer subsequences with redundancy, to obtain a nucleotide sequence group.

在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法，該方法包括：i)將輸入資料轉換成二進位串；ii)將二進位串分成非重疊的5比特二進位串的序列；iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串；iv)將該整數串分成多個具有預定長度的初始整數子序列；v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列；vi)在添加索引資訊之後，添加冗餘數據至多個具有索引的整數子序列，從而獲得多個具有冗餘的整數子序列，從而獲得核苷酸序列組；以及vii)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping 5-bit binary strings Iii) Convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and use a 5-bit transcoding framework to convert the integer string; iv) Divide the integer string into multiples with a predetermined length Initial integer sub-sequence; v) add index information to each of the initial integer sub-sequences to obtain multiple indexed integer sub-sequences; vi) after adding index information, add redundant data to multiple indexed integers Subsequences, thereby obtaining multiple integer subsequences with redundancy, thereby obtaining a nucleotide sequence group; and vii) synthesizing a nucleic acid group including the nucleotide sequence group.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)將輸入資料轉換成二進位串；ii)將二進位串分成非重疊的5比特二進位串的序列；iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串；iv)將該整數串分成多個具有預定長度的初始整數子序列；v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列；vi)創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數；vii)用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣(例如，通過應用RS編碼的串校正和/或RS編碼的塊校正)；以及viii)基於被填充的矩陣獲得多個具有冗餘的整數子序列，從而獲得核苷酸序列組。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping Sequence of 5 bit binary strings; iii) convert each 5 bit binary string to an integer in the range of 0 to 31 to obtain an integer string and convert the integer string using a 5-bit transcoding framework; iv) convert the integer string Divided into multiple initial integer subsequences with a predetermined length; v) Add index information to each of the multiple initial integer subsequences to obtain multiple indexed integer subsequences; vi) Create an empty matrix, where the columns in the empty matrix The number is greater than the size of multiple indexed integer subsequences, and the number of rows in the empty matrix is greater than the number of integers in each of the multiple indexed integer subsequences; vii) use multiple indexed Integer subsequences and data generated by applying error correction coding to fill the empty matrix (for example, by applying RS-coded string correction and/or RS-coded block correction); and viii) obtaining multiple redundant data based on the filled matrix Integer subsequences to obtain a set of nucleotide sequences.

在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法，該方法包括：i)將輸入資料轉換成二進位串；ii)將二進位串分成非重疊的5比特二進位串的序列；iii)將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串以及使用5比特轉碼框架轉換該整數串；iv)將該整數串分成多個具有預定長度的初始整數子序列；v)添加索引資訊至多個初始整數子序列中的每一個以獲得多個具有索引的整數子序列；vi)創建空矩陣，其中空矩陣中的列數大於多個具有索引的整數子序列的大小，且其中空矩陣中的行數大於在多個具有索引的整數子序列中的每一個中的整數的個數；vii)用多個具有索引的整數子序列和通過應用錯誤校正編碼生成的資料填充空矩陣(例如，通過應用RS編碼的串校正和/或RS編碼的塊校正)；以及viii)基於被填充的矩陣獲得多個具有冗餘的整數子序列，從而獲得前述核苷酸序列組；以及xi)合成包括該核苷酸序列組的核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes: i) converting the input data into a binary string; ii) dividing the binary string into non-overlapping 5-bit binary strings Iii) Convert each 5-bit binary string to an integer in the range of 0 to 31 to obtain an integer string and use a 5-bit transcoding framework to convert the integer string; iv) Divide the integer string into multiples with a predetermined length Initial integer sub-sequences; v) add index information to each of the multiple initial integer sub-sequences to obtain multiple indexed integer sub-sequences; vi) create an empty matrix, where the number of columns in the empty matrix is greater than multiple indexed The size of the integer subsequence of, and the number of rows in the empty matrix is greater than the number of integers in each of the multiple indexed integer subsequences; vii) use multiple indexed integer subsequences and pass the application The data generated by the error correction coding fills the empty matrix (for example, by applying RS-coded string correction and/or RS-coded block correction); and viii) obtaining multiple integer subsequences with redundancy based on the filled matrix, thereby obtaining The aforementioned nucleotide sequence group; and xi) synthesis of a nucleic acid group including the nucleotide sequence group.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：i)獲得核酸組的核苷酸序列組，ii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列；iii)將該核苷酸序列組轉換成二進位串；以及iv)將該二進位串轉換成輸出資料，從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) obtaining a nucleotide sequence group of a nucleic acid group, ii) converting the nucleotide sequence group into Multiple integer subsequences including integers in the range of 0-31; iii) converting the nucleotide sequence group into a binary string; and iv) converting the binary string into output data, thereby obtaining the aforementioned output data.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：i)測序核酸組以生成多個序列讀數；ii)配對、合併和/或過濾以獲得核苷酸序列組；iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列；iv)將錯誤校正編碼應用於前述多個整數子序列，從而獲得多個具有索引的整數子序列；v)將前述具有索引的多個整數子序列轉換成二進位串；以及vi)將該二進位串轉換成輸出資料，從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply error correction codes to the aforementioned multiple integer subsequences, thereby obtaining multiple Indexed integer subsequences; v) converting the aforementioned indexed integer subsequences into a binary string; and vi) converting the binary string into output data to obtain the aforementioned output data.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：i)測序核酸組以生成多個序列讀數；ii)配對、合併和/或過濾以獲得核苷酸序列組；iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列；iv)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致整數子序列；v)將RS編碼塊校正應用於前述多個一致的整數子序列以獲得多個具有索引的整數子序列；vi)將前述具有索引的多個整數子序列轉換成二進位串；以及vii)將該二進位串轉換成輸出資料，從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply RS coding string correction to the foregoing multiple integer subsequences to obtain multiple Uniform integer subsequences; v) applying RS coding block correction to the foregoing multiple uniform integer subsequences to obtain multiple indexed integer subsequences; vi) converting the foregoing indexed integer subsequences to binary String; and vii) Convert the binary string into output data, thereby obtaining the aforementioned output data.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：i)測序核酸組以生成多個序列讀數；ii)配對、合併和/或過濾以獲得核苷酸序列組；iii)將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列；iv)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致的整數子序列；v)將RS編碼塊校正應用於前述多個一致的整數子序列以獲得多個具有索引的整數子序列；vi)從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列；vii)將前述核心整數子序列合併為整數串；viii)將前述整數串轉換成二進位串；以及ix)將該二進位串轉換成輸出資料，從而獲得前述輸出資料。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided, the method comprising: i) sequencing a nucleic acid set to generate multiple sequence reads; ii) pairing, merging and/or filtering to obtain Nucleotide sequence group; iii) convert the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31; iv) apply RS coding string correction to the foregoing multiple integer subsequences to obtain multiple Uniform integer subsequences; v) applying RS coding block correction to the foregoing multiple uniform integer subsequences to obtain multiple indexed integer subsequences; vi) removing indexes from the foregoing multiple indexed integer subsequences To obtain multiple core integer subsequences; vii) merge the aforementioned core integer subsequences into an integer string; viii) convert the aforementioned integer string into a binary string; and ix) convert the binary string into output data to obtain the aforementioned Output data.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。前述5比特轉碼框架以表2為依據。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The aforementioned 5-bit transcoding framework is based on Table 2.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。前述5比特轉碼框架以表2為依據。R和Y的選擇是基於：1)與緊鄰R或Y前面的核苷酸不同；以及和/或2)核苷酸序列的估算GC含量。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The aforementioned 5-bit transcoding framework is based on Table 2. The selection of R and Y is based on: 1) different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the nucleotide sequence.

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。輸入資料對應於壓縮檔。壓縮檔是使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)進行壓縮的。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The input data corresponds to the compressed file. The compressed file is compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。輸入資料對應於兩個或更多個檔。資料處理步驟進一步包括：將兩個或更多個檔歸組為TAR檔。使用朗佩爾-齊科-瑪律可夫鏈演算法(Lempel-Zic-Markov chain algorithm)(「LZMA」)將TAR檔進一步壓縮。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The input data corresponds to two or more files. The data processing step further includes: grouping two or more files into TAR files. The TAR file is further compressed using the Lempel-Zic-Markov chain algorithm ("LZMA").

在一些實施型態中，提供一種用於將輸入資料轉換成核苷酸序列組的電腦實現方法，該方法包括：i)資料處理步驟，包括將輸入資料轉換成二進位串；以及ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。核苷酸編碼步驟進一步包括將引物序列對附加至前述核苷酸序列組的每個核苷酸序列的5’和3’末端。 In some embodiments, a computer-implemented method for converting input data into a set of nucleotide sequences is provided. The method includes: i) data processing steps, including converting input data into a binary string; and ii) kernel The nucleotide coding step includes converting the binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences. The nucleotide coding step further includes appending primer sequence pairs to the 5'and 3'ends of each nucleotide sequence of the aforementioned nucleotide sequence group.

在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法包括a)將前述輸入資料轉換成核苷酸序列組，其中前述轉換包括i)資料處理步驟，包括將輸入資料轉換成二進位串；ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換二進位串以獲得核苷酸序列組；以及b)合成包括該核苷酸序列組的核酸組。該方法進一步包括附接引物對至前述合成核酸組。 In some embodiments, a method for storing input data on a nucleic acid includes a) converting the foregoing input data into a nucleotide sequence group, wherein the foregoing conversion includes i) a data processing step, including converting the input data into Binary string; ii) Nucleotide encoding step, including converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) Synthesizing a nucleic acid group including the nucleotide sequence group. The method further includes attaching a primer pair to the aforementioned synthetic nucleic acid set.

在一些實施型態中，提供一種在核酸上儲存兩組或更多組輸入資料的方法，該方法包括：a)根據本發明所記載之任何一種方法，將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列；b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端，其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同；以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) according to any one of the methods described in the present invention, separate two or more sets of input data Convert into two or more sets of corresponding nucleotide sequences; b) Attach primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences, respectively, where The primer pairs of the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively.

在一些實施型態中，提供一種在核酸上儲存兩組或更多組輸入資料的方法，該方法包括：a)根據本發明所記載之任何一種方法，將兩組或更多組輸入資料分別轉換成兩組或更多組相應的核苷酸序列；b)分別將引物序列對附加至兩組或更多組相應的核苷酸序列中的每一組的5’和3’末端，其中前述兩組或更多組相應的核苷酸序列的引物對彼此不同；以及c)分別合成包括前述兩組或更多組相應的核苷酸序列的兩組或更多組核酸。其中每對引物具有的序列不同於兩組或更多組相應的核苷酸序列或其互補序列中的任何一個。 In some embodiments, a method for storing two or more sets of input data on a nucleic acid is provided. The method includes: a) according to any one of the methods described in the present invention, separate two or more sets of input data Convert into two or more sets of corresponding nucleotide sequences; b) Attach primer sequence pairs to the 5'and 3'ends of each of the two or more sets of corresponding nucleotide sequences, respectively, where The primer pairs of the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) Two or more sets of nucleic acids including the aforementioned two or more sets of corresponding nucleotide sequences are synthesized, respectively. Each pair of primers has a sequence different from any one of two or more sets of corresponding nucleotide sequences or complementary sequences thereof.

在一些實施型態中，合成核酸組的GC含量範圍為30%至70%。 In some embodiments, the GC content of the synthetic nucleic acid group ranges from 30% to 70%.

在一些實施型態中，提供一種用於在核酸上儲存輸入資料的方法，該方法包括a)將前述輸入資料轉換成核苷酸序列組，其中前述轉換包括i)資料處理步驟，包括將輸入資料轉換成二進位串；ii)核苷酸編碼步驟，包括使用5比特轉碼框架轉換二進位串以獲得核苷酸序列組；以及b)合成包括前述核苷酸序列組的核酸組。該方法進一步包括儲存前述合成核酸組。 In some embodiments, a method for storing input data on a nucleic acid is provided. The method includes a) converting the foregoing input data into a set of nucleotide sequences, wherein the foregoing conversion includes i) a data processing step, including converting the input The data is converted into a binary string; ii) a nucleotide coding step, which includes converting the binary string using a 5-bit transcoding framework to obtain a nucleotide sequence group; and b) synthesizing a nucleic acid group including the aforementioned nucleotide sequence group. The method further includes storing the aforementioned synthetic nucleic acid set.

在一些實施型態中，前述合成核酸組通過乾燥儲存。在一些實施型態中，前述合成核酸組通過凍乾法儲存。 In some embodiments, the aforementioned synthetic nucleic acid set is stored by drying. In some embodiments, the aforementioned synthetic nucleic acid set is stored by lyophilization.

在一些實施型態中，將合成核酸組固定在載體上，其可為微陣列。 In some embodiments, the synthetic nucleic acid set is fixed on a carrier, which may be a microarray.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：a)獲得核酸組的核苷酸序列組，b)將該核苷酸序列組轉換成輸出資料，其中前述轉換包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料，從而獲得前述輸出資料。該方法包括在取回輸出資料之前擴增前述核酸組。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided. The method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the nucleotide sequence group into Output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including converting the binary string Output data to obtain the aforementioned output data. The method includes amplifying the aforementioned nucleic acid group before retrieving the output data.

在一些實施型態中，提供一種用於取回儲存在核酸上的輸出資料的方法，該方法包括：a)獲得核酸組的核苷酸序列組，b)將該核苷酸序列組轉換成輸出資料，其中前述轉換包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料，從而獲得前述輸出資料。該方法進一步包括測序前述核酸組以生成多個序列讀數。將多個序列讀數配對、合併和過濾以獲得前述核苷酸序列組。 In some embodiments, a method for retrieving output data stored on a nucleic acid is provided. The method includes: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the nucleotide sequence group into Output data, wherein the aforementioned conversion includes: i) a nucleotide decoding step, including converting the nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) a data processing step, including converting the binary string Output data to obtain the aforementioned output data. The method further includes sequencing the aforementioned nucleic acid set to generate multiple sequence reads. Multiple sequence reads are paired, combined and filtered to obtain the aforementioned set of nucleotide sequences.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列，從而獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列，從而獲得多個具有索引的整數子序列。應用錯誤校正編碼的步驟包括：i)將RS編碼串校正應用於前述多個整數子序列以獲得多個一致整數子序列；以及ii)將RS編碼塊校正應用於前述多個一致整數子序列以獲得多個具有索引的整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The steps of applying error correction coding include: i) applying RS coding string correction to the foregoing multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) applying RS coding block correction to the foregoing multiple uniform integer subsequences to Obtain multiple indexed integer subsequences.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5 比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列，從而獲得多個具有索引的整數子序列。核苷酸解碼步驟進一步包括從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the foregoing multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The nucleotide decoding step further includes removing the index from the aforementioned plurality of indexed integer subsequences to obtain a plurality of core integer subsequences.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。將輸出資料儲存在壓縮檔中。資料處理步驟進一步包括例如藉由通過LZMA演算法解壓壓縮檔。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. Store the output data in a compressed file. The data processing step further includes, for example, by decompressing the compressed file through the LZMA algorithm.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。輸出資料對應於多個檔。該方法進一步包括通過TAR演算法從輸出資料中提取前述多個檔。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The output data corresponds to multiple files. The method further includes extracting the aforementioned multiple files from the output data through the TAR algorithm.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。核苷酸解碼步驟將該核苷酸序列組轉換成包括0-31範圍內的整數的多個整數子序列。核苷酸解碼步驟進一步包括將錯誤校正編碼應用於前述多個整數子序列，從而獲得多個具有索引的整數子序列。核苷酸解碼步驟進一步包括從前述多個具有索引的整數子序列中去除索引以獲得多個核心整數子序列。核苷酸解碼步驟進一步包括將核心整數子序列合併為整數串以及將前述整數串轉換成二進位串。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The nucleotide decoding step converts the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31. The nucleotide decoding step further includes applying an error correction code to the aforementioned multiple integer subsequences, thereby obtaining multiple indexed integer subsequences. The nucleotide decoding step further includes removing the index from the aforementioned plurality of indexed integer subsequences to obtain a plurality of core integer subsequences. The nucleotide decoding step further includes merging the core integer subsequences into an integer string and converting the aforementioned integer string into a binary string.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。前述5比特轉碼框架以表2為依據。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The aforementioned 5-bit transcoding framework is based on Table 2.

在一些實施型態中，提供一種用於將核苷酸序列組轉換成輸出資料的電腦實現方法，該方法包括：i)核苷酸解碼步驟，包括使用5比特轉碼框架將該核苷酸序列組轉換成二進位串；以及ii)資料處理步驟，包括將該二進位串轉換成輸出資料。前述核酸組包括位於5’和3’末端的引物序列且該方法包括在核苷酸解碼步驟之前去除前述引物序列。 In some embodiments, a computer-implemented method for converting a set of nucleotide sequences into output data is provided. The method includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the nucleotide The sequence group is converted into a binary string; and ii) the data processing step includes converting the binary string into output data. The aforementioned nucleic acid set includes primer sequences at the 5'and 3'ends and the method includes removing the aforementioned primer sequence before the nucleotide decoding step.

在一些實施型態中，提供一種用於基於DNA的資料儲存的電腦可行的方法，該方法包括：將數字化檔轉換成二進位串；使用5比特轉碼框架轉換前述二進位串以獲得整數串；從前述整數串中獲得多個整數子序列；以及將前述多個整數子序列轉換成多個DNA寡聚物的表現形式用於合成DNA。 In some embodiments, a computer feasible method for DNA-based data storage is provided. The method includes: converting a digitized file into a binary string; using a 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string ; Obtain multiple integer subsequences from the aforementioned integer string; and convert the foregoing multiple integer subsequences into multiple DNA oligomers for the synthesis of DNA.

在一些實施型態中，使用5比特轉碼框架轉換前述二進位串以獲得整數串包括：將二進位串分成非重疊的5比特二進位串的序列；將每個5比特二進位串轉換成0至31範圍內的整數以獲得整數串。在一些實施型態中，將前述整數串進一步分成多個具有預定長度的初始整數子序列。在一些實施型態中，獲得待轉換的多個整數子序列包括：將索引資訊添加至初始多個整數子序列中的每個子序列；在添加索引資訊之後，添加冗餘數據至初始多個整數子序列以獲得多個整數子序列。在一些實施型態中，添加至每個初始多個子序列的索引資訊包括整數串，且其中對應於索引資訊的整數串的長度以數字化檔的大小為基礎。 In some embodiments, using the 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string includes: dividing the binary string into a sequence of non-overlapping 5-bit binary strings; converting each 5-bit binary string into Integers in the range 0 to 31 to obtain integer strings. In some embodiments, the aforementioned integer string is further divided into a plurality of initial integer sub-sequences having a predetermined length. In some implementations, obtaining multiple integer subsequences to be converted includes: adding index information to each of the initial multiple integer subsequences; after adding index information, adding redundant data to the initial multiple integers Subsequence to obtain multiple integer subsequences. In some implementations, the index information added to each initial plurality of subsequences includes integer strings, and the length of the integer string corresponding to the index information is based on the size of the digitized file.

在一些實施型態中，前述方法包括添加冗餘數據至多個整數子序列，其可以包括，例如，獲得初始多個整數子序列的子集；選擇空矩陣，其中空矩陣中的列數大於子集中子序列的數量，且其中空矩陣中的行數大於子集的每個子序列中的整數的個數；用初始多個整數子序列的子集和對應於錯誤校正碼的資料填充空矩陣；以及基於被填充的矩陣獲得多個整數子序列。在一些實施型態中，空矩陣的列數基於錯誤校正碼的類型、預定的容錯度值，子集的大小或其組合來選擇。在一些實施型態中，空矩陣的行數基於錯誤校正碼的類型、預定的容錯度值、子集的大小或其組合來選擇。 In some embodiments, the foregoing method includes adding redundant data to multiple integer subsequences, which may include, for example, obtaining a subset of the initial multiple integer subsequences; selecting an empty matrix, where the number of columns in the empty matrix is greater than the sub The number of subsequences in the set, and the number of rows in the empty matrix is greater than the number of integers in each subsequence of the subset; fill the empty matrix with a subset of the initial multiple integer subsequences and data corresponding to the error correction code; And obtain multiple integer subsequences based on the filled matrix. In some implementations, the number of columns of the empty matrix is selected based on the type of error correction code, the predetermined error tolerance value, the size of the subset, or a combination thereof. In some implementations, the number of rows of the empty matrix is selected based on the type of error correction code, the predetermined error tolerance value, the size of the subset, or a combination thereof.

在一些實施型態中，錯誤校正碼是里德-所羅門(「RS」)碼。在一些實施型態中，將多個整數子序列轉換成多個DNA寡聚物的表現形式包括將前述多個整數子序列的整數轉換成三個核苷酸的表現形式，其中：這三個核苷酸的第一個選自A、T、G和C，這三個核苷酸的第二個選自A、T、G和C，且這三個核苷酸的第三個選自兩種選擇之一。 In some implementations, the error correction code is a Reed-Solomon ("RS") code. In some embodiments, the conversion of multiple integer subsequences into multiple DNA oligomers includes the conversion of the integers of the foregoing multiple integer subsequences into three nucleotides, where: these three The first of the nucleotides is selected from A, T, G and C, the second of these three nucleotides is selected from A, T, G and C, and the third of these three nucleotides is selected from One of two options.

在一些實施型態中，數字化檔是對應於由一個或更多個檔或目錄構成的組的壓縮檔。在一些實施型態中，數字化檔包括使用朗佩爾-齊科-瑪律可夫鏈演算法壓縮的對應於由一個或更多個檔或目錄構成的組的 LZMA檔。 In some embodiments, the digitized file is a compressed file corresponding to the group consisting of one or more files or directories. In some implementations, the digitized files include LZMA files corresponding to the group consisting of one or more files or directories compressed using the Lampel-Zico-Markov chain algorithm.

在根據上述任何一個實施型態的一些實施型態中，其中前述方法進一步包括：添加表示引物對的資料至多個DNA寡聚物的表現形式的每個寡聚物表現形式；以及在添加表示引物對的資訊之後，從而基於多個DNA寡聚物的表現形式進行DNA合成。 In some implementation forms according to any one of the implementation forms above, wherein the aforementioned method further comprises: adding data representing the primer pair to each oligomer expression form of the plurality of DNA oligomer expression forms; After the correct information, DNA synthesis is performed based on the representation of multiple DNA oligomers.

在一些實施型態中，前述方法進一步包括：獲得第二數字化檔；基於第二數字化檔獲得第二多個DNA寡聚物的表現形式；添加表示第二引物對的資料至第二多個DNA寡聚物的表現形式的每個寡聚物表現形式，其中第二引物對與第一引物對不同；以及基於多個DNA寡聚物的表現形式和第二多個DNA寡聚物的表現形式進行DNA合成。 In some embodiments, the foregoing method further includes: obtaining a second digitized file; obtaining a representation of the second plurality of DNA oligomers based on the second digitized file; adding data representing the second primer pair to the second plurality of DNA Each oligomer representation of the oligomer, where the second primer pair is different from the first primer pair; and based on the representation of multiple DNA oligomers and the representation of second multiple DNA oligomers Perform DNA synthesis.

在一些實施型態中，提供一種用於基於DNA的資料取回的電腦可行的方法，該方法包括：獲得多個對應於數字化檔的讀數；基於前述多個讀數，獲得多個整數子序列；將前述多個整數子序列轉換成整數串；使用5比特框架將前述整數串轉換成二進位串；以及基於二進位串獲得數字化檔。在一些實施型態中，獲得多個對應於前述數字化檔的讀數包括：識別與數字化檔預關聯的引物。在一些實施型態中，獲得多個整數子序列包括進行基於多個讀數的基於頻率的錯誤校正。在一些實施型態中，使用5比特轉碼框架將整數串轉換成二進位串包括：將整數串的每個整數轉換成5比特二進位數字。 In some embodiments, a computer-feasible method for DNA-based data retrieval is provided. The method includes: obtaining multiple readings corresponding to digitized files; based on the foregoing multiple readings, obtaining multiple integer subsequences; Converting the aforementioned multiple integer subsequences into an integer string; using a 5-bit frame to convert the aforementioned integer string into a binary string; and obtaining a digitized file based on the binary string. In some embodiments, obtaining multiple readings corresponding to the aforementioned digital file includes identifying primers pre-associated with the digital file. In some embodiments, obtaining multiple integer subsequences includes performing frequency-based error correction based on multiple readings. In some implementations, using a 5-bit transcoding framework to convert an integer string to a binary string includes: converting each integer of the integer string to a 5-bit binary number.

在一些實施型態中，提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質，前述一個或更多個程式包括指令，當其由電子裝置的一個或更多個處理器執行時，使電子裝置：將數字化檔轉換成二進位串；使用5比特轉碼框架轉換前述二進位串以獲得整數串；從前述整數串中獲得多個整數子序列；以及將前述多個整數子序列轉換成多個DNA寡聚物的表現形式用於DNA合成。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device: convert the digitized file into a binary string; use a 5-bit transcoding framework to convert the aforementioned binary string to obtain an integer string; obtain multiple integer subsequences from the foregoing integer string; and convert the multiple integer substrings The sequence is converted into a representation of multiple DNA oligomers for DNA synthesis.

在一些實施型態中，提供一種用於提供基於DNA的資料儲存的系統，前述系統包括：一個或更多個處理器；記憶體；和一個或更多個程式，其中前述一個或多個程式儲存在記憶體中且被配置為由一個或多個處理器執行，前述一個或更多個程式包括：用於將數字化檔轉換成二進位串的指令；用於使用5比特編碼框架轉換前述二進位串以獲得整數串的指令；用於從前述整數串中獲得多個整數子序列的指令；以及用於將前述多個整數子序列轉換成多個DNA寡聚物的表現形式的指令。 In some embodiments, a system for providing DNA-based data storage is provided. The foregoing system includes: one or more processors; a memory; and one or more programs, wherein the aforementioned one or more programs Stored in memory and configured to be executed by one or more processors, the aforementioned one or more programs include: instructions for converting digitized files into binary strings; for converting the aforementioned two using a 5-bit encoding framework Instructions for obtaining a string of integers; instructions for obtaining multiple integer subsequences from the aforementioned integer string; and instructions for converting the aforementioned multiple integer subsequences into representations of multiple DNA oligomers.

在一些實施型態中，提供一種儲存一個或更多個程式的非暫時性電腦可讀儲存介質，前述一個或更多個程式包括指令，當其由電子裝置的一個或更多個處理器執行時，使電子設備獲得多個對應於數字化檔的讀數；基於前述多個讀數，獲得多個整數子序列；將前述多個整數子序列轉換成整數串；使用5比特框架將前述整數串轉換成二進位串；以及基於前述二進位串獲得數字化檔。 In some embodiments, a non-transitory computer-readable storage medium storing one or more programs is provided. The aforementioned one or more programs include instructions, which are executed by one or more processors of an electronic device When the electronic device obtains multiple readings corresponding to the digitized file; based on the multiple readings, obtains multiple integer subsequences; converts the multiple integer subsequences into integer strings; uses a 5-bit frame to convert the foregoing integer strings into Binary string; and obtaining a digitized file based on the aforementioned binary string.

在一些實施型態中，提供一種用於提供基於DNA的資料儲存的系統，前述系統包括：一個或更多個處理器；記憶體；和一個或更多個程式，其中前述一個或多個程式儲存在記憶體中且被配置為由前述一個或多個處理器執行，前述一個或更多個程式包括：用於獲得多個對應於數字化檔的讀數的指令；用於基於前述多個讀數獲得多個整數子序列的指令；用於將前述多個整數子序列轉換成整數串的指令；用於使用5比特框架將前述整數串轉換成二進位串的指令；以及用於基於前述二進位串獲得數字化檔的指令。 In some embodiments, a system for providing DNA-based data storage is provided. The foregoing system includes: one or more processors; a memory; and one or more programs, wherein the aforementioned one or more programs Stored in the memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs include: instructions for obtaining a plurality of readings corresponding to the digitized file; for obtaining based on the aforementioned plurality of readings Instructions for multiple integer subsequences; instructions for converting the aforementioned multiple integer subsequences into integer strings; instructions for converting the aforementioned integer strings into binary strings using a 5-bit frame; and for use based on the aforementioned binary strings Get instructions for digital files.

根據示例性實現方法，前述方法的不同步驟通過一個或更多個電腦軟體程式實現，此軟體程式包括設計為由根據本發明的繼電器模組的資料處理器執行的軟體指令以及設計為控制該方法的不同步驟的執行的軟體指令。 According to an exemplary implementation method, the different steps of the aforementioned method are implemented by one or more computer software programs, which include software instructions designed to be executed by the data processor of the relay module according to the invention and designed to control the method Software instructions for the execution of different steps.

因此，本發明一方面亦關於易於由電腦或由資料處理器執行的程式，此程式包括命令以控制如上前述的方法的步驟的執行。 Therefore, on the one hand, the present invention also relates to a program that is easy to be executed by a computer or by a data processor. This program includes commands to control the execution of the steps of the aforementioned method.

此方法可以以原始碼、目標代碼或在原始碼和目標代碼之間的代碼的形式，例如以部分編譯的形式或以任何其他所需形式，使用任何程式設計語言。 This method can use any programming language in the form of source code, object code, or code between source code and object code, for example, in the form of partial compilation or in any other desired form.

本發明亦關於一種可由資料處理器讀取並包括如上所述的程式的指令的資訊介質。 The invention also relates to an information medium that can be read by the data processor and includes instructions of the program as described above.

資訊介質可以是能夠儲存程式的任何實體或設備。例如，該介質可以包括諸如ROM(其代表「唯讀記憶體」)的存放裝置，例如CD-ROM(其代表「光碟唯讀記憶體」)或微電子電路ROM或磁記錄設備，例如軟碟或硬碟驅動器。 The information medium may be any entity or device capable of storing programs. For example, the medium may include a storage device such as a ROM (which stands for "read only memory"), such as a CD-ROM (which stands for "optical disk read only memory") or a microelectronic circuit ROM or magnetic recording device, such as a floppy disk Or hard drive.

進一步地，資訊介質可以是通過無線電或其他方式的能傳送的載體，例如能通過電纜和光纜送達的電信號或光信號。該程式尤其可以是下載到網際網路類型的網路中。 Further, the information medium may be a carrier that can be transmitted by radio or other means, for example, an electrical signal or an optical signal that can be delivered through a cable and an optical cable. In particular, the program can be downloaded into an Internet-type network.

作為選擇，資訊介質可以是包含前述程式的積體電路，該電路適於執行或用於執行所討論的方法。 Alternatively, the information medium may be an integrated circuit containing the aforementioned program, which is suitable for performing or for performing the method in question.

根據一個實施型態，借助於軟體及/或硬體元件來實現本發明的實施型態。從這個觀點來看，術語「模組」在此文檔中可以對應於軟體元件和硬體元件或一組硬體和軟體元件。 According to an implementation form, the implementation form of the present invention is implemented by means of software and/or hardware components. From this point of view, the term "module" in this document may correspond to software components and hardware components or a group of hardware and software components.

軟體元件對應於一個或更多個電腦程式、一個或更多個程式的副程式或更通常地對應於程式或軟體程式的任何元素，其能夠根據下面針對所涉及的模組描述的內容實現一個功能或一組功能。一個如此的軟體元件通過物理實體(終端、伺服器等)的資料處理器執行且能夠存取該物理實體(記憶體、記錄介質、通信匯流排、輸入/輸出電子板、使用者介面等)的硬體資源。 A software component corresponds to one or more computer programs, a subprogram of one or more programs, or more generally to any element of a program or a software program, which can implement one according to the content described below for the modules involved Function or set of functions. Such a software component is executed by a data processor of a physical entity (terminal, server, etc.) and can access the physical entity (memory, recording medium, communication bus, input/output electronic board, user interface, etc.) Hardware resources.

相似地，硬體元件對應於能夠根據下面針對所涉及的模組描述的內容實現一個功能或一組功能的硬體單元的任何元素。它可以是可程式設計硬體元件或具有用於執行軟體的積體電路的元件，例如積體電路、智慧卡、記憶體卡、用於執行固件的電子板等。在變體中，硬體元件包括作為積體電路的處理器，例如中央處理單元及/或微處理器和/或專用積體電路(ASIC)及/或專用指令集處理器(ASIP)及/或圖形處理單元(GPU)及/或物理處理單元(PPU)及/或數字訊號處理器(DSP)及/或圖像處理器及/或輔助處理器及/或浮點單元及/或網路處理器及/或音訊處理器及/或多核處理器。此外，硬體元件還可包括基帶處理器(包括例如記憶體單元和固件)和/或接收或發送無線電信號的無線電電子電路(其可包括電線)。在一個實施型態中，硬體元件符合一個或多個標準，例如ISO/IEC 18092/ECMA-340、ISO/IEC 21481/ECMA-352、GSMA、StoLPaN、ETSI/SCP(智慧卡平台)、GlobalPlatform(即安全元件)。在一個變形例中，硬體元件是射頻識別(RFID)標籤。在一個實施型態中，硬體元件包括實現藍牙通信及/或Wi-Fi通信及/或Zigbee通信及/或USB通信及/或火線通信和/或NFC(用於近場)通信的電路。 Similarly, a hardware element corresponds to any element of a hardware unit that can implement a function or a group of functions according to the content described below for the involved module. It can be a programmable hardware component or a component with an integrated circuit for executing software, such as an integrated circuit, a smart card, a memory card, an electronic board for executing firmware, etc. In a variant, the hardware element includes a processor as an integrated circuit, such as a central processing unit and/or a microprocessor and/or a dedicated integrated circuit (ASIC) and/or a dedicated instruction set processor (ASIP) and/or Or graphics processing unit (GPU) and/or physical processing unit (PPU) and/or digital signal processor (DSP) and/or image processor and/or auxiliary processor and/or floating point unit and/or network Processor and/or audio processor and/or multi-core processor. In addition, the hardware component may also include a baseband processor (including, for example, a memory unit and firmware) and/or a radio electronic circuit (which may include wires) that receives or transmits radio signals. In one embodiment, the hardware components comply with one or more standards, such as ISO/IEC 18092/ECMA-340, ISO/IEC 21481/ECMA-352, GSMA, StoLPaN, ETSI/SCP (Smart Card Platform), GlobalPlatform (Ie secure element). In a variant, the hardware element is a radio frequency identification (RFID) tag. In one embodiment, the hardware components include circuits that implement Bluetooth communication and/or Wi-Fi communication and/or Zigbee communication and/or USB communication and/or FireWire communication and/or NFC (for near field) communication.

應當注意，獲得本發明中的元素/值的步驟可以被視為在電子設備的記憶體單元中讀取這種元素/值的步驟或者通過通信手段接收來自另一電子設備的這種元素/值的步驟。 It should be noted that the step of obtaining the element/value in the present invention may be regarded as a step of reading such element/value in the memory unit of the electronic device or receiving such element/value from another electronic device through communication means A step of.

示例性過程 Exemplary process

圖1表示根據一些實施型態的用於提供基於DNA的資料儲存和取回的示例性過程。具體地，示例性步驟102-110涉及編碼數字資料用於儲存，且示例性步驟112-122涉及解碼儲存資訊用於取回。下文，參考圖2-5，進一步具體描述圖1中的示例性步驟。 FIG. 1 shows an exemplary process for providing DNA-based data storage and retrieval according to some embodiments. Specifically, exemplary steps 102-110 involve encoding digital data for storage, and exemplary steps 112-122 involve decoding stored information for retrieval. Hereinafter, referring to FIGS. 2-5, the exemplary steps in FIG. 1 will be described in further detail.

1.編碼1. Coding

在步驟102(「資料壓縮」)中，將一個或更多個檔及/或目錄打包到單個檔中，然後壓縮成壓縮檔。在一些實施例中，該檔及/或目錄被打包到TAR檔(例如，File.tar)中，然後使用朗佩爾-齊科-瑪律可夫鏈演算法(即LZMA演算法)將其壓縮成LZMA檔(例如，File.tar.lzma)。在一些實施例中，一個LZMA檔作為用於資料取回(例如，在解碼期間)的單個不可拆分的單元進行操作。因此，如果要將多個檔及目錄一起儲存但隨機且獨立地取回，則應將它們分組為多個TAR檔，並在此步驟壓縮為多個相應的LZMA檔。 In step 102 ("data compression"), one or more files and/or directories are packaged into a single file, and then compressed into a compressed file. In some embodiments, the file and/or directory is packaged into a TAR file (for example, File.tar), and then the Lampere-Zico-Malkov Chain Algorithm (ie, LZMA Algorithm) Compressed into LZMA file (for example, File.tar.lzma). In some embodiments, one LZMA file operates as a single non-splitter unit for data retrieval (eg, during decoding). Therefore, if multiple files and directories are to be stored together but retrieved randomly and independently, they should be grouped into multiple TAR files and compressed into multiple corresponding LZMA files at this step.

在步驟104中，實施第一輪資料轉碼。首先，將每個LZMA 檔轉換為二進位串。作為實施例，參考圖2中，名為「File.tar.lzma」的檔被轉換為二進位串。然後將二進位字元串轉換為整數串B(「0；10；25；...；4；8；31」)。在所描繪的實施例中，使用5比特轉碼框架實現從二進位串到整數串B的轉換。如圖所示，二進位串被分成一系列非重疊的5比特二進位串，例如「00000」和「01010」。然後將每個5比特二進位串轉換為整數以形成整數串B。所屬技術領域之通常知識者應知悉，在該5比特轉碼框架下，整數字串中的每個整數範圍是從0(對應於「00000」)到31(對應於「11111」)。 In step 104, the first round of data transcoding is implemented. First, convert each LZMA file to a binary string. As an example, referring to FIG. 2, a file named "File.tar.lzma" is converted to a binary string. Then convert the binary character string into an integer string B ("0; 10; 25; ...; 4; 8; 31"). In the depicted embodiment, the conversion from a binary string to an integer string B is achieved using a 5-bit transcoding framework. As shown in the figure, the binary string is divided into a series of non-overlapping 5-bit binary strings, such as "00000" and "01010". Each 5-bit binary string is then converted to an integer to form an integer string B. Those of ordinary skill in the art should know that, under the 5-bit transcoding framework, each integer in the integer string ranges from 0 (corresponding to "00000") to 31 (corresponding to "11111").

如圖2所示，然後使用長度固定的滑動視窗將整數串B劃分為多個非重疊的整數子序列(例如，[A1，A2，...，An])。在圖2中描繪的實施例中，如圖2所描繪的，每個整數子序列(例如，A1)由22個整數組成。最後，將索引資訊附加到每個子序列的開頭以形成具有索引的新的多個整數子序列(例如，[B1，B2，...，Bn])。在所描繪的實施例中，索引資訊包括3個整數的序列，每個整數的範圍從0至31。索引序列的長度可以基於各種因素選擇，例如壓縮檔的大小和DNA合成的產量。 As shown in Fig. 2, a fixed-length sliding window is then used to divide the integer string B into multiple non-overlapping integer subsequences (eg, [A1, A2, ..., An]). In the embodiment depicted in FIG. 2, as depicted in FIG. 2, each integer subsequence (eg, A1) consists of 22 integers. Finally, the index information is appended to the beginning of each subsequence to form a new multiple integer subsequence with indexes (eg, [B1, B2, ..., Bn]). In the depicted embodiment, the index information includes a sequence of 3 integers, each integer ranging from 0 to 31. The length of the index sequence can be selected based on various factors, such as the size of the compressed file and the yield of DNA synthesis.

回到圖1，在步驟106中，多個具有索引的整數子序列(例如，如圖2中所示的[B1，B2，...，Bn])被進一步轉化為多個具有索引和冗餘的整數子序列(例如，如圖4所示的[C1，C2，...，Cm])。各種錯誤校正編碼演算法，例如里德-所羅門(RS)編碼、噴泉編碼(fountain coding)和漢明編碼(hamming coding)，可用於將冗餘數據添加到要儲存的數字資料。在較佳的實施型態中，使用RS編碼是因為其魯棒性和易於實現。 Returning to FIG. 1, in step 106, multiple indexed integer subsequences (for example, [B1, B2, ..., Bn] as shown in FIG. 2) are further transformed into multiple indexed and redundant The remaining integer subsequences (for example, [C1, C2, ..., Cm] as shown in FIG. 4). Various error correction coding algorithms, such as Reed-Solomon (RS) coding, fountain coding, and hamming coding, can be used to add redundant data to the digital data to be stored. In a preferred embodiment, RS coding is used because of its robustness and ease of implementation.

圖3A-D顯示了用於向數字內容(例如，由多個整數子序列 [A1，A2，...，An]表示)添加索引和冗餘以獲得[C1，C2，...，Cm]的示例性過程。具體地，圖3A-D顯示了如何使用RS編碼處理前五個整數子序列(即A1、A2、A3、A4和A5)以形成[C1，C2，...，C31]。對於其餘的整數子序列(即，A6，...An)，每五個連續整數子序列以與圖3A-D中所示的類似的方式作為一個單元處理。在此實施例中，經由29×31矩陣一起處理五個整數子序列，使得塊校正的同位(parity)為26(即，31-5=26)，因此31中的13個(即，26/2=13)個寡聚物可能丟失，但能根據RS編碼的原理恢復。 Figures 3A-D show how to add indexes and redundancy to digital content (eg, represented by multiple integer subsequences [A1, A2, ..., An]) to obtain [C1, C2, ..., Cm ]'S exemplary process. Specifically, FIGS. 3A-D show how to use RS encoding to process the first five integer subsequences (ie, A1, A2, A3, A4, and A5) to form [C1, C2, ..., C31]. For the remaining integer sub-sequences (ie, A6,...An), every five consecutive integer sub-sequences are treated as a unit in a manner similar to that shown in FIGS. 3A-D. In this embodiment, the five integer subsequences are processed together via a 29×31 matrix, so that the parity of the block correction is 26 (ie, 31-5=26), so 13 of 31 (ie, 26/ 2=13) oligomers may be lost, but can be recovered according to the principle of RS encoding.

參考圖3A，準備一個29×31的空矩陣，用來自[A1，A2，...，An]的前五個整數串A1、A2、A3、A4、A5填充此矩陣，顯示為佔據22×5的子矩陣。此區域為中心資料塊。 Referring to FIG. 3A, prepare a 29×31 empty matrix, and fill the matrix with the first five integer strings A1, A2, A3, A4, A5 from [A1, A2, ..., An], which are shown to occupy 22× Sub-matrix of 5. This area is the central data block.

回到圖3B，將由三個0至31範圍內的整數組成的索引序列附加到每列的起始作為一個獨特的索引，該索引串可在附加之前儲存。如圖所示，該索引會以升序儲存或分配，例如0-0-0、0-0-1、0-0-2、……、0-0-31、0-1-31、……。在圖3B中，該編入索引的整數串分別標為B1、B2、B3、B4和B5。 Returning to FIG. 3B, an index sequence consisting of three integers ranging from 0 to 31 is appended to the beginning of each column as a unique index, and the index string can be stored before appending. As shown in the figure, the index will be stored or allocated in ascending order, such as 0-0-0, 0-0-1, 0-0-2, ..., 0-0-31, 0-1-31, ... . In FIG. 3B, the indexed integer strings are labeled B1, B2, B3, B4, and B5, respectively.

參考圖3C，RS編碼用於逐行填充核心資料塊佔據的每一行的空白區域。此步驟被稱為「塊校正」並且有助於處理在合成、測序期間例如丟失的寡聚物和插入缺失(包括插入和缺失)以及長期儲存的變性。 Referring to FIG. 3C, RS coding is used to fill the blank area of each line occupied by the core data block line by line. This step is called "block correction" and helps to deal with, for example, lost oligomers and insertion deletions (including insertions and deletions) and long-term storage denaturation during synthesis and sequencing.

回到圖3D，RS編碼用於逐列填充整個矩陣的每列的空白區域。該步驟被稱為「串校正」並且有助於校正例如在合成、測序和長期儲存期間引起的點突變。如圖3D所示，矩陣現在包括31個整數串[C1， C2，...，C31]。換言之，在塊校正和字串校正之後，前述5個整數子序列A1-A5被轉化為31個整數子序列C1-C31。此外，A1-A5中的每一個包含22個整數，而C1-C31中的每一個包含29個整數(包括3個額外的索引整數和4個用於錯誤校正的RS編碼的同位的額外整數)。應該理解的是，如圖3A-D所示的各種尺寸僅是示例性的。索引串的長度(圖3A-D中的3)、矩陣的大小(例如，圖3A-D中的29×31)以及要作為單元處理的整數串的數量(例如，圖3A-D中的5)可以基於多種因素選擇，例如所使用的錯誤編碼的類型、所需的容錯度和DNA合成平台的特徵。 Returning to FIG. 3D, RS coding is used to fill the blank area of each column of the entire matrix column by column. This step is called "string correction" and helps correct for point mutations caused during synthesis, sequencing, and long-term storage, for example. As shown in FIG. 3D, the matrix now includes 31 integer strings [C1, C2, ..., C31]. In other words, after the block correction and the string correction, the aforementioned 5 integer sub-sequences A1-A5 are converted into 31 integer sub-sequences C1-C31. In addition, each of A1-A5 contains 22 integers, while each of C1-C31 contains 29 integers (including 3 additional index integers and 4 parity extra integers for RS encoding for error correction) . It should be understood that the various dimensions shown in FIGS. 3A-D are merely exemplary. The length of the index string (3 in Figures 3A-D), the size of the matrix (for example, 29×31 in Figures 3A-D), and the number of integer strings to be treated as units (for example, 5 in Figures 3A-D) ) Can be selected based on a variety of factors, such as the type of error code used, the required fault tolerance, and the characteristics of the DNA synthesis platform.

如圖4所示，通過根據參考圖3A-D描述的技術的RS編碼的一輪串校正和RS編碼的一輪塊校正，將具有索引的多個整數子序列，[B1，B2，...，Bn]，轉化為具有冗餘的多個整數子序列，[C1，C2，...，Cm]，其中m大於n。此外，整數子序列[C1，C2，...，Cm]中的每個整數的範圍為0至31。 As shown in FIG. 4, through a round of string correction of RS coding and a round of block correction of RS coding according to the technique described with reference to FIGS. 3A-D, multiple integer subsequences with indexes, [B1, B2, ..., Bn], converted to multiple integer subsequences with redundancy, [C1, C2, ..., Cm], where m is greater than n. In addition, each integer in the integer subsequence [C1, C2, ..., Cm] ranges from 0 to 31.

在圖3A-D中所描繪的實施例中。基於多個因素計算初始整數子序列諸如A1的長度(在所描繪的實施例中為22)。具體地，具有索引和冗餘的整數串的長度(在所描繪的實施例中表示為L，29)由合成平台的寡聚物長度計算。兩個字串(在所描繪的實施例中表示為X，4)的竒偶校驗和塊校正由寡聚物的合成誤差率、所使用的錯誤校正碼和所需的容錯率確定。索引長度(在所描繪的實施例中表示為Y，3)由總編碼資料大小確定。因此，初始整數串的長度(表示為Z)是Z=L-X-Y。 In the embodiment depicted in Figures 3A-D. The length of the initial integer sub-sequence such as A1 (22 in the depicted embodiment) is calculated based on multiple factors. Specifically, the length of the integer string with index and redundancy (denoted as L, 29 in the depicted embodiment) is calculated from the oligomer length of the synthesis platform. The parity check and block correction of the two strings (denoted as X, 4 in the depicted embodiment) are determined by the synthetic error rate of the oligomer, the error correction code used, and the required error tolerance rate. The index length (denoted as Y, 3 in the depicted embodiment) is determined by the total coded data size. Therefore, the length of the initial integer string (expressed as Z) is Z=L-X-Y.

回到圖1，在步驟108中，實施第二輪轉碼以將具有冗餘的整數串(例如，[C1，C2，...，Cm])列表轉化為多個DNA寡聚物的表現形式(例如，[D1，D2，...，Dm])。DNA寡聚物的每種表現形式含有用於合成的個鹼基A、T、G和C。特別是，「5比特轉碼框架」可以再次使用。此處，整數串[C1，C2，...，Cm]中的每個整數的範圍為0至31，因此可以唯一地映射到32種3核苷酸中的一種(例如，三聚體，包括NNY和NNR，其中N代表A、T、G、C；Y代表C和T；且R代表A和G)。例如，如圖5所示，整數6對應於5比特二進位串「00110」並且可以在特定策略下翻譯成「AGR」。在一些實施例中，5比特轉碼框架可以提供整數與DNA寡聚物的表現形式之間的直接轉換，而無需任何中間步驟(例如，首先將整數轉換為二進位串)。 Returning to FIG. 1, in step 108, a second round of transcoding is implemented to convert the list of redundant integer strings (eg, [C1, C2, ..., Cm]) into a representation of multiple DNA oligomers (For example, [D1, D2, ..., Dm]). Each form of DNA oligomer contains the bases A, T, G, and C used for synthesis. In particular, the "5-bit transcoding framework" can be used again. Here, each integer in the integer string [C1, C2, ..., Cm] ranges from 0 to 31, so it can be uniquely mapped to one of 32 kinds of 3 nucleotides (for example, trimer, Including NNY and NNR, where N represents A, T, G, C; Y represents C and T; and R represents A and G). For example, as shown in Figure 5, the integer 6 corresponds to the 5-bit binary string "00110" and can be translated into "AGR" under a specific strategy. In some embodiments, the 5-bit transcoding framework can provide a direct conversion between integers and representations of DNA oligomers without any intermediate steps (eg, first convert the integer to a binary string).

因此，每個整數子序列中的29個整數中的每一個(例如，C1)可以被映射成3核苷酸。在轉換所有[C1，C2，...，Cm]之後，用C或T取代Y，而在DNA合成之前用A或G取代R時。這樣做是為了確保第3個鹼基與三聚體的第2個鹼基不同，並避免3個連續相同的鹼基(例如，AAA、GGG、TTT、CCC)。此外，通過Y和R的選擇，每種寡聚物的GC百分比應限制在30%至70%。取代步驟既減少了由寡聚物合成引起的誤差，又對於改善寡聚物合成的校正比率具有重要意義。 Thus, each of the 29 integers (eg, C1) in each integer subsequence can be mapped to 3 nucleotides. After converting all [C1, C2, ..., Cm], replace Y with C or T, and replace R with A or G before DNA synthesis. This is done to ensure that the third base is different from the second base of the trimer, and to avoid three consecutive identical bases (eg, AAA, GGG, TTT, CCC). In addition, by choosing Y and R, the GC percentage of each oligomer should be limited to 30% to 70%. The replacement step not only reduces the error caused by oligomer synthesis, but also has important significance for improving the correction ratio of oligomer synthesis.

根據RS編碼的原理，可容忍的誤差可以包括，來自圖3A-D所示的示例性方案中的相同矩陣的每個寡聚物的兩個(即，字串校正的竒偶校驗，4的一半)突變以及31個寡聚物中的13個(即，塊校正的同位，26的一半)丟失的寡聚物(包括完全丟失的寡聚物和具有插入缺失的寡聚物)。 According to the principle of RS coding, tolerable errors may include two from each oligomer of the same matrix in the exemplary scheme shown in FIGS. 3A-D (ie, string-corrected parity check, 4 Half) mutations and 13 out of 31 oligomers (ie, block corrected parity, half of 26) lost oligomers (including completely lost oligomers and oligomers with indels).

參考圖1，在步驟110中，附加引物對並進行DNA合成。在一些實施例中，單個壓縮檔(例如，圖4的File.tar.lzma)被轉化為DNA寡聚物的多個表現形式(例如，圖4中的[D1，D2，...，Dn])。並且在對應於壓縮檔的每個寡聚物的兩個末端添加相同引物序列對的表現形式。對於要同時儲存和合成但需要在隨後的讀取和解碼期間隨機可存取的多個壓縮檔，選擇獨特的正交引物對用於每個壓縮檔並與其相關聯。例如，如果有3個壓縮檔要同時儲存和合成但需要在隨後的讀取和解碼期間隨機存取，則選擇3對獨特的正交引物分別與3個壓縮檔相關聯。對於每個壓縮檔，將所選引物對附加到對應於壓縮檔的多個寡聚物中每個寡聚物。然後，可以將對應於前述多個壓縮檔的所有寡聚物合並且同時一起合成為儲存介質。 Referring to FIG. 1, in step 110, a primer pair is added and DNA synthesis is performed. In some embodiments, a single compressed file (eg, File.tar.lzma of FIG. 4) is converted into multiple representations of DNA oligomers (eg, [D1, D2, ..., Dn in FIG. 4 ]). And the expression form of adding the same primer sequence pair at the two ends of each oligomer corresponding to the compressed file. For multiple compression files to be stored and synthesized simultaneously but requiring random access during subsequent reading and decoding, a unique orthogonal primer pair is selected for and associated with each compression file. For example, if there are 3 compressed files to be stored and synthesized at the same time but need to be randomly accessed during subsequent reading and decoding, then select 3 pairs of unique orthogonal primers to be associated with each of the 3 compressed files. For each compressed file, the selected primer pair is appended to each oligomer in the multiple oligomers corresponding to the compressed file. Then, all the oligomers corresponding to the aforementioned multiple compression files can be combined and synthesized together into a storage medium at the same time.

為了選擇引物對，可以使用多種標準。例如，可以選擇引物對以避免同源二聚體、異源二聚體、髮夾結構並且具有足夠的特異性(例如，沒有針對編碼核酸序列的結合位點)。在一些實例中，使用多重PCR引物設計標準。 To select primer pairs, a variety of criteria can be used. For example, primer pairs can be selected to avoid homodimers, heterodimers, hairpin structures and have sufficient specificity (eg, there is no binding site for the encoding nucleic acid sequence). In some examples, multiple PCR primer design standards are used.

2.解碼 2. Decoding

解碼程式基本上是編碼程式的逆過程。參考圖1，在步驟112中，使用引物對進行PCR以擴增相應壓縮檔(例如，圖4的File.tar)寡聚物列表(例如，圖4中的[D1，D2，...，Dn])。如果需要用單次運行的NGS讀取和解碼多個壓縮檔，則應使用所有相應的引物對通過PCR擴增所有其相應的寡聚物列表。此步驟也稱為「NGS庫製備」。 The decoding program is basically the reverse process of the encoding program. Referring to FIG. 1, in step 112, PCR is performed using primer pairs to amplify the corresponding compressed file (for example, File.tar of FIG. 4) oligomer list (for example, [D1, D2, ..., in FIG. 4 Dn]). If you need to read and decode multiple compressed files with a single run of NGS, you should amplify all their corresponding oligomer lists by PCR using all corresponding primer pairs. This step is also called "NGS library preparation".

在步驟114中，(例如，通過Illumina測序系統)進行雙端下一代測序和讀數配對和合併。具體地，將來自相同簇的正向和反向讀數配對並合併成單個讀數，並且將過濾具有不規則長度的所有新讀數(例如，具有插入缺失的讀數)。此外，根據引物序列，可以針對每個壓縮檔對所有讀數進行分組。在隨後的步驟中，對應於相同壓縮檔的讀數(即，共用相同引物的讀數)將被一起分析。 In step 114, double-ended next-generation sequencing and read pairing and merging are performed (eg, by Illumina sequencing system). Specifically, the forward and reverse readings from the same cluster are paired and merged into a single reading, and all new readings with irregular lengths (eg, readings with insertion deletions) will be filtered. In addition, based on the primer sequence, all readings can be grouped for each compressed file. In subsequent steps, the readings corresponding to the same compressed file (ie, readings sharing the same primer) will be analyzed together.

在步驟116中，進行反向RS編碼。在一些實施例中，會利用29乘31的零矩陣但非空矩陣。具體地，每個來自單個壓縮檔的讀數具有在兩個末端被移除的PCR引物，然後通過RS編碼的串校正將其轉換成整數子序列，目的是對突變進行錯誤校正。因為一種寡聚物在合成過程中可能具有許多分子拷貝且被多次測序，上述許多讀取可能源自一種寡聚物。由於在高通量合成和測序期間引起的錯誤，這些讀數可能發生變化，但正確的讀數應占主導地位。通過在整數子序列的每個位置處的基於最高頻率的校正，可校正共用相同索引的所有整數子序列並將其合併成一致整數子序列。例如，對於共用相同索引的一組讀數，其一致整數子序列的各個位置應該由在該位置最頻繁出現的整數確定。 In step 116, reverse RS coding is performed. In some embodiments, a 29 by 31 zero matrix but a non-empty matrix will be used. Specifically, each reading from a single compressed file has PCR primers removed at both ends, which are then converted into integer subsequences by RS-encoded string correction with the purpose of error correction for mutations. Because an oligomer may have many molecular copies and be sequenced multiple times during synthesis, many of the above reads may be derived from an oligomer. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings should dominate. Through the highest frequency-based correction at each position of the integer subsequence, all integer subsequences sharing the same index can be corrected and merged into a uniform integer subsequence. For example, for a group of readings that share the same index, each position of a consistent integer subsequence should be determined by the integer that occurs most frequently at that position.

在步驟118，整數串的清單可以通過RS編碼的塊校正來完全解碼，恢復丟失的寡聚物以及具有插入和刪除的寡聚物。由於一種寡聚物在合成過程中可能具有許多分子拷貝並且被多次測序，因此許多讀數可能代表一種寡聚物。由於在高通量合成和測序期間引起的錯誤，這些讀數可能發生變化，但是與原始設計的寡聚物良好匹配的正確讀數仍然具有計數優勢。通過在整數串的每個位置處的基於最高頻率的校正，可以校正共用相同索引的所有整數串並將其合併到字串校正和塊校正之間的一致整數串中。由於具有插入和刪除的寡聚物具有不規則的長度並且將在錯誤校正期間刪除，因此相應的資料完全等於資訊缺乏並且需要恢復。基於索引資訊，矩陣的列在基於最高頻率的校正之後填充。 At step 118, the list of integer strings can be fully decoded by RS-encoded block correction, recovering lost oligomers and oligomers with insertions and deletions. Since an oligomer may have many molecular copies and be sequenced multiple times during synthesis, many readings may represent an oligomer. These readings may change due to errors caused during high-throughput synthesis and sequencing, but correct readings that are well matched to the originally designed oligomers still have a counting advantage. Through the highest frequency-based correction at each position of the integer string, all integer strings sharing the same index can be corrected and merged into a consistent integer string between word string correction and block correction. Since oligomers with insertions and deletions have irregular lengths and will be deleted during error correction, the corresponding data is completely equal to lack of information and needs to be restored. Based on the index information, the columns of the matrix are filled in after the correction based on the highest frequency.

在步驟120中，進行轉碼。讀數通過索引儲存，然後從每個整數子序列中刪除索引。然後可以將所有整數子序列連接成單個整數字串，然後通過5比特轉碼框架將其轉移(transferred into)到二進位字元串中。 In step 120, transcoding is performed. Readings are stored by index, and then the index is deleted from each integer subsequence. Then all integer sub-sequences can be connected into a single integer string, which is then transferred into a binary string through a 5-bit transcoding framework.

在步驟122中，進行解壓縮。具體地，系統將二進位串寫入壓縮檔，然後依次通過LZMA演算法和TAR演算法解壓壓縮檔。對於多個壓縮檔的隨機存取，應該獨立地為每個壓縮檔執行步驟116到122。池(pool)可以儲存多個壓縮檔。每個壓縮檔都有自己的PCR引物。在解碼期間，不必對整個池進行測序。相反，相應的PCR引物用於擴增某個壓縮檔的寡聚物，然後對擴增的寡聚物進行測序以解碼此相應的壓縮檔而不是整個池。 In step 122, decompression is performed. Specifically, the system writes the binary string to the compressed file, and then decompresses the compressed file through the LZMA algorithm and the TAR algorithm in sequence. For random access to multiple compressed files, steps 116 to 122 should be performed independently for each compressed file. The pool can store multiple compressed files. Each compressed file has its own PCR primer. During decoding, it is not necessary to sequence the entire pool. Instead, the corresponding PCR primers are used to amplify the oligomer of a certain compressed file, and then sequence the amplified oligomer to decode this corresponding compressed file rather than the entire pool.

如上文所討論的，利用(leveraged)了5比特轉碼框架。具體地，來自二進位串的每5個連續的比特可以表示為0至31之間的整數以及之後的3個核苷酸[nt](即三聚體)。例如，DNA寡聚物由四個鹼基組成(例如，A、T、G和C)，因此二聚體(即NN)應該有16種(例如AA、AT、AG、AC、TA、TT、TG、TC、GA、GT、GG、GC、CA、CT、CG和CC)。假設簡併鹼基R和Y在二聚體之後連接，三聚體(NNR/NNY)應該由32種組成，其也與0至31範圍內的32個整數良好地匹配並使二進位串良好地轉移到DNA序列中。在寡聚物合成過程中，是否選擇A或G來表示R以及是否選擇C或T來代替Y取決於它們前面的鹼基(即三聚體的第2個鹼基)，實際上前述系統可以使得第2個和第3個鹼基不同，然後與此同時保持GC平衡。鑒於此前提條件達到，將在候選鹼基之間隨機選擇精確鹼基。總之，該轉換框架的編碼潛力是1.67(即針對3nt為5比特)。 As discussed above, a 5-bit transcoding framework is utilized. Specifically, every 5 consecutive bits from the binary string can be expressed as an integer between 0 and 31 and the following 3 nucleotides [nt] (ie, trimer). For example, DNA oligomers are composed of four bases (eg, A, T, G, and C), so there should be 16 dimers (ie, NN) (eg, AA, AT, AG, AC, TA, TT, TG, TC, GA, GT, GG, GC, CA, CT, CG and CC). Assuming that the degenerate bases R and Y are connected after the dimer, the trimer (NNR/NNY) should consist of 32 species, which also matches well with 32 integers in the range of 0 to 31 and makes the binary string good To the DNA sequence. In the process of oligomer synthesis, whether to choose A or G to represent R and whether to choose C or T to replace Y depends on the base in front of them (ie the second base of the trimer). In fact, the aforementioned system can Make the second and third bases different, and at the same time maintain GC balance. In view of this premise, the exact bases will be randomly selected between candidate bases. In short, the coding potential of the conversion framework is 1.67 (ie 5 bits for 3nt).

圖7示出了基於DNA的資料儲存和取回技術的示例性實現。在這裡，包含如圖6所示的中文字元的文字檔(資料大小：1.16kb)根據本發明所記載之過程經由DNA儲存。 FIG. 7 shows an exemplary implementation of DNA-based data storage and retrieval technology. Here, a text file (data size: 1.16 kb) containing Chinese characters as shown in FIG. 6 is stored via DNA according to the process described in the present invention.

在編碼期間，將文字檔壓縮成單個壓縮檔，然後使用403個具有87nt長度的寡聚物通過DNA儲存框架儲存。同時，為了模擬隨機存取，使用該壓縮檔的6個副本並選擇6對引物。將每對引物添加在403個寡聚物中的每一個的兩個末端。前述6對引物(每個20nt)是正交的，這意味著它們中的任意兩個具有足夠的漢明距離，並且與403個寡聚物中的任何一個具有較少的相似性。在此提交的ASCII文字檔中的序列表包括SEQ ID NO.1-SEQ ID NO.403和作為SEQ ID NO.404-415的引物對PP NO.1-PP NO.6。 During encoding, the text file is compressed into a single compressed file, and then 403 oligomers with a length of 87 nt are stored through a DNA storage framework. At the same time, in order to simulate random access, 6 copies of the compressed file are used and 6 pairs of primers are selected. Each pair of primers was added to both ends of each of the 403 oligomers. The aforementioned 6 pairs of primers (each 20 nt) are orthogonal, which means that any two of them have a sufficient Hamming distance and have less similarity to any of the 403 oligomers. The sequence table in the ASCII text file submitted here includes SEQ ID NO.1-SEQ ID NO.403 and primer pairs PP NO.1-PP NO.6 as SEQ ID NO.404-415.

然後進行寡聚物池的合成。使用CustomArray,Inc.開發的CustomArray平台合成總共2418(即403乘以6)個寡聚物。每種寡聚物為127nt，其包括總共40nt引物(每個末端20nt)。 Then the synthesis of the oligomer pool is carried out. A total of 2418 (ie 403 times 6) oligomers were synthesized using the CustomArray platform developed by CustomArray, Inc. Each oligomer is 127 nt, which includes a total of 40 nt primers (20 nt at each end).

然後進行PCR擴增和NGS。對所有壓縮檔副本進行6次PCR反應。在使用TruSeq DNA PCR-free HT文庫製備試劑盒(以板格式(plate format)的96個索引，96個樣品)和6個文庫索引製備6個樣品之後，由於寡聚物的127nt的長度，使用MiSeq試劑盒V3(150個循環)對合併(pooled)的樣品一起進行測序。NGS資料的Q30為94%(官方標準>85%)，簇密度為1,301K/mm²(官方標準1200-1400K/mm²)。 Then PCR amplification and NGS are performed. Perform 6 PCR reactions on all compressed file copies. After using TruSeq DNA PCR-free HT library preparation kit (96 indexes in plate format, 96 samples) and 6 library indexes to prepare 6 samples, due to the length of oligomer 127nt, use MiSeq kit V3 (150 cycles) sequenced pooled samples together. The Q30 of NGS data is 94% (official standard>85%), and the cluster density is 1,301K/mm ² (official standard 1200-1400K/mm ² ).

最後，進行解碼。在對壓縮檔的每個副本進行獨立解碼之後，可以隨機地成功取回和解壓所有副本而沒有任何錯誤。 Finally, decode. After each copy of the compressed file is independently decoded, all copies can be retrieved and decompressed randomly without any errors.

圖8呈現了可用於執行本發明的方法的一個或多個步驟的設備。標示為800的這種設備包括計算單元(例如，「中央處理單元」CPU)，標示為801，以及一個或更多個記憶體單元(例如RAM(「隨機存取記憶體」)塊(其中中間結果可以是在執行電腦程式的指令期間暫時儲存)，或除其他方面之外儲存電腦程式的ROM塊，或EEPROM(「電子抹除式可複寫唯讀記憶體」)塊或快閃記憶體塊)，標示為802。電腦程式可由計算單元執行的指令構成。這樣的設備800還可以包括標示為803的專用單元，其構成輸入-輸出介面以允許設備800與其他設備通信。特別地，此專用單元803可以與天線連接(以便在沒有接觸的情況下進行通信)，或者與序列埠連接(以進行通信「接觸」)。應該注意，這些單元可以通過例如匯流排一起交換資料。 Figure 8 presents an apparatus that can be used to perform one or more steps of the method of the present invention. This device labeled 800 includes a computing unit (eg, "Central Processing Unit" CPU), labeled 801, and one or more memory units (eg, RAM ("Random Access Memory") blocks (where the The result can be a temporary storage during the execution of the instructions of the computer program), or a ROM block that stores the computer program, among other things, or an EEPROM ("electronically erasable and rewritable read-only memory") block or flash memory block ), marked as 802. The computer program can be composed of instructions executed by the computing unit. Such a device 800 may also include a dedicated unit labeled 803, which constitutes an input-output interface to allow the device 800 to communicate with other devices. In particular, this dedicated unit 803 can be connected to an antenna (for communication without contact), or to a serial port (for communication "contact"). It should be noted that these units can exchange data together via, for example, a bus.

在替代實施型態中，先前描述的方法的一些或所有步驟可以在可程式設計FPGA(「現場可程式設計閘陣列」(「Field Programmable Gate Array」))組件或ASIC(「專用積體電路」)元件中的硬體中實現。 In alternative implementations, some or all of the steps of the previously described method can be in a programmable FPGA ("Field Programmable Gate Array") component or an ASIC ("dedicated integrated circuit" ) Implemented in the hardware of the component.

在替代實施型態中，先前描述的方法的一些或所有步驟可以在包括記憶體單元和處理單元的電子設備(如圖8中所揭示)上執行。這種裝置800可以與高通量合成平台(例如CustomArray)以及DNA測序儀(例如MiSeq測序儀)組合使用。 In alternative embodiments, some or all of the steps of the previously described method may be performed on an electronic device that includes a memory unit and a processing unit (as disclosed in FIG. 8). This device 800 can be used in combination with a high-throughput synthesis platform (eg CustomArray) and a DNA sequencer (eg MiSeq sequencer).

圖9A描繪了用於在核酸上儲存輸入資料的示例性方法900。在塊902上，輸入資料被轉換成核苷酸序列組。在塊904上，輸入資料被轉換成二進位串。在塊906上，使用5比特轉碼框架轉換該二進位串以獲得核苷酸序列組。在塊908上，合成包括該核苷酸序列組的核酸組。 9A depicts an exemplary method 900 for storing input data on nucleic acids. At block 902, the input data is converted into a set of nucleotide sequences. At block 904, the input data is converted into a binary string. At block 906, the binary string is converted using a 5-bit transcoding framework to obtain a set of nucleotide sequences. At block 908, a nucleic acid group including the nucleotide sequence group is synthesized.

圖9B描繪了用於取回儲存在核酸上的輸出資料的示例性方法950。在塊952上，獲得核酸組的核苷酸序列組。在塊954上，該核苷酸序列組被轉換成輸出資料。具體地，在塊956上，使用5比特轉碼框架將該核苷酸序列組轉換成二進位串。在塊958上，該二進位串被轉換為輸出資料。 9B depicts an exemplary method 950 for retrieving output data stored on nucleic acids. At block 952, a nucleotide sequence set of nucleic acid sets is obtained. At block 954, the set of nucleotide sequences is converted into output data. Specifically, at block 956, the set of nucleotide sequences is converted into a binary string using a 5-bit transcoding framework. At block 958, the binary string is converted into output data.

儘管已經參考附圖充分描述了本發明和實施例，但應注意，各種變化和修改對於所屬技術領域之通常知識者而言是顯而易見的。這些變化和修改應被理解為包括在由申請專利範圍限定所揭露之內容及實施例的範圍內。 Although the present invention and embodiments have been fully described with reference to the accompanying drawings, it should be noted that various changes and modifications will be apparent to those of ordinary skill in the art. These changes and modifications should be understood to be included within the scope of the contents and embodiments disclosed by the scope of the patent application.

出於解釋的目的，已經參考具體實施型態描述了前述說明書。然而，以上說明性討論並非旨在窮舉或將本發明限制於所揭露的精確形式。鑒於上述教導，許多修改和變化都是可能的。所選擇和描述的實施例是為了最好地解釋技術的原理及其實際應用。因此，本所屬技術領域之通常知識者能夠最好地利用具有適合於預期的特定用途的各種修改的技術和各種實施型態。 For the purpose of explanation, the foregoing description has been described with reference to specific embodiments. However, the above illustrative discussion is not intended to be exhaustive or to limit the invention to the precise forms disclosed. In light of the above teachings, many modifications and changes are possible. The selected and described embodiments are to best explain the principle of the technology and its practical application. Therefore, those of ordinary skill in the art can best utilize various modified techniques and various implementations suitable for the specific intended use.

此申請要求於2017年7月25日提交的中國專利申請No.201710611123.2的權益，其全部內容通過引用併入本發明用於全部目的。 This application requires the rights and interests of China Patent Application No. 201710611123.2 filed on July 25, 2017, the entire contents of which are incorporated by reference into the present invention for all purposes.

以下提交在ASCII文字檔中的內容通過引用整體併入本發明：序列表的電腦可讀形式(CRF)(檔案名：申請號107127162序列表電子資料.TXT，記錄日期：2018年11月30日，大小：179KB)。 The following content submitted in an ASCII text file is incorporated into the present invention by reference in its entirety: Computer readable form (CRF) of sequence table (file name: application number 107127162 sequence table electronic data. TXT, record date: November 30, 2018 , Size: 179KB).

<110> 大陸商南京金斯瑞生物科技有限公司 <110> Continental Business Nanjing Kingsray Biological Technology Co., Ltd.

<120> 以DNA為基礎之資料存取 <120> DNA-based data access

<130> 75989-20003.40 <130> 75989-20003.40

<140> 107127162 <140> 107127162

<141> 2018-08-03 <141> 2018-08-03

<150> CN201710611123.2 <150> CN201710611123.2

<151> 2018-07-25 <151> 2018-07-25

<160> 415 <160> 415

<170> FastSEQ for Windows Version 4.0 <170> FastSEQ for Windows Version 4.0

<210> 1 <210> 1

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 1

<400> 1

<210> 2 <210> 2

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 2

<400> 2

<210> 3 <210> 3

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 3

<400> 3

<210> 4 <210> 4

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 4

<400> 4

<210> 5 <210> 5

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 5

<400> 5

<210> 6 <210> 6

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 6

<400> 6

<210> 7 <210> 7

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 7

<400> 7

<210> 8 <210> 8

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 8

<400> 8

<210> 9 <210> 9

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 9

<400> 9

<210> 10 <210> 10

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 10

<400> 10

<210> 11 <210> 11

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 11

<400> 11

<210> 12 <210> 12

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 12

<400> 12

<210> 13 <210> 13

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 13

<400> 13

<210> 14 <210> 14

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 14

<400> 14

<210> 15 <210> 15

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 15

<400> 15

<210> 16 <210> 16

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 16

<400> 16

<210> 17 <210> 17

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 17

<400> 17

<210> 18 <210> 18

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 18

<400> 18

<210> 19 <210> 19

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 19

<400> 19

<210> 20 <210> 20

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 20

<400> 20

<210> 21 <210> 21

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 21

<400> 21

<210> 22 <210> 22

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 22

<400> 22

<210> 23 <210> 23

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 23

<400> 23

<210> 24 <210> 24

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 24

<400> 24

<210> 25 <210> 25

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 25

<400> 25

<210> 26 <210> 26

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 26

<400> 26

<210> 27 <210> 27

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 27

<400> 27

<210> 28 <210> 28

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 28

<400> 28

<210> 29 <210> 29

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 29

<400> 29

<210> 30 <210> 30

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 30

<400> 30

<210> 31 <210> 31

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 31

<400> 31

<210> 32 <210> 32

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 32

<400> 32

<210> 33 <210> 33

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 33

<400> 33

<210> 34 <210> 34

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 34

<400> 34

<210> 35 <210> 35

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 35

<400> 35

<210> 36 <210> 36

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 36

<400> 36

<210> 37 <210> 37

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 37

<400> 37

<210> 38 <210> 38

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 38

<400> 38

<210> 39 <210> 39

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 39

<400> 39

<210> 40 <210> 40

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 40

<400> 40

<210> 41 <210> 41

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 41

<400> 41

<210> 42 <210> 42

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 42

<400> 42

<210> 43 <210> 43

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 43

<400> 43

<210> 44 <210> 44

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 44

<400> 44

<210> 45 <210> 45

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 45

<400> 45

<210> 46 <210> 46

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 46

<400> 46

<210> 47 <210> 47

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 47

<400> 47

<210> 48 <210> 48

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 48

<400> 48

<210> 49 <210> 49

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 49

<400> 49

<210> 50 <210> 50

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 50

<400> 50

<210> 51 <210> 51

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 51

<400> 51

<210> 52 <210> 52

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 52

<400> 52

<210> 53 <210> 53

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 53

<400> 53

<210> 54 <210> 54

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 54

<400> 54

<210> 55 <210> 55

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 55

<400> 55

<210> 56 <210> 56

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 56

<400> 56

<210> 57 <210> 57

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 57

<400> 57

<210> 58 <210> 58

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 58

<400> 58

<210> 59 <210> 59

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 59

<400> 59

<210> 60 <210> 60

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 60

<400> 60

<210> 61 <210> 61

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 61

<400> 61

<210> 62 <210> 62

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 62

<400> 62

<210> 63 <210> 63

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 63

<400> 63

<210> 64 <210> 64

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 64

<400> 64

<210> 65 <210> 65

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 65

<400> 65

<210> 66 <210> 66

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 66

<400> 66

<210> 67 <210> 67

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 67

<400> 67

<210> 68 <210> 68

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 68

<400> 68

<210> 69 <210> 69

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 69

<400> 69

<210> 70 <210> 70

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 70

<400> 70

<210> 71 <210> 71

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 71

<400> 71

<210> 72 <210> 72

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 72

<400> 72

<210> 73 <210> 73

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 73

<400> 73

<210> 74 <210> 74

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 74

<400> 74

<210> 75 <210> 75

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 75

<400> 75

<210> 76 <210> 76

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 76

<400> 76

<210> 77 <210> 77

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 77

<400> 77

<210> 78 <210> 78

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 78

<400> 78

<210> 79 <210> 79

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 79

<400> 79

<210> 80 <210> 80

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 80

<400> 80

<210> 81 <210> 81

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 81

<400> 81

<210> 82 <210> 82

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 82

<400> 82

<210> 83 <210> 83

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 83

<400> 83

<210> 84 <210> 84

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 84

<400> 84

<210> 85 <210> 85

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 85

<400> 85

<210> 86 <210> 86

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 86

<400> 86

<210> 87 <210> 87

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 87

<400> 87

<210> 88 <210> 88

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 88

<400> 88

<210> 89 <210> 89

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 89

<400> 89

<210> 90 <210> 90

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 90

<400> 90

<210> 91 <210> 91

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 91

<400> 91

<210> 92 <210> 92

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 92

<400> 92

<210> 93 <210> 93

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 93

<400> 93

<210> 94 <210> 94

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 94

<400> 94

<210> 95 <210> 95

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 95

<400> 95

<210> 96 <210> 96

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 96

<400> 96

<210> 97 <210> 97

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 97

<400> 97

<210> 98 <210> 98

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 98

<400> 98

<210> 99 <210> 99

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 99

<400> 99

<210> 100 <210> 100

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 100

<400> 100

<210> 101 <210> 101

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 101

<400> 101

<210> 102 <210> 102

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 102

<400> 102

<210> 103 <210> 103

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 103

<400> 103

<210> 104 <210> 104

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 104

<400> 104

<210> 105 <210> 105

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 105

<400> 105

<210> 106 <210> 106

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 106

<400> 106

<210> 107 <210> 107

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 107

<400> 107

<210> 108 <210> 108

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 108

<400> 108

<210> 109 <210> 109

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 109

<400> 109

<210> 110 <210> 110

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 110

<400> 110

<210> 111 <210> 111

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 111

<400> 111

<210> 112 <210> 112

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 112

<400> 112

<210> 113 <210> 113

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 113

<400> 113

<210> 114 <210> 114

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 114

<400> 114

<210> 115 <210> 115

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 115

<400> 115

<210> 116 <210> 116

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 116

<400> 116

<210> 117 <210> 117

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 117

<400> 117

<210> 118 <210> 118

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 118

<400> 118

<210> 119 <210> 119

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 119

<400> 119

<210> 120 <210> 120

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 120

<400> 120

<210> 121 <210> 121

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 121

<400> 121

<210> 122 <210> 122

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 122

<400> 122

<210> 123 <210> 123

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 123

<400> 123

<210> 124 <210> 124

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 124

<400> 124

<210> 125 <210> 125

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 125

<400> 125

<210> 126 <210> 126

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 126

<400> 126

<210> 127 <210> 127

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 127

<400> 127

<210> 128 <210> 128

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 128

<400> 128

<210> 129 <210> 129

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 129

<400> 129

<210> 130 <210> 130

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 130

<400> 130

<210> 131 <210> 131

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 131

<400> 131

<210> 132 <210> 132

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 132

<400> 132

<210> 133 <210> 133

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 133

<400> 133

<210> 134 <210> 134

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 134

<400> 134

<210> 135 <210> 135

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 135

<400> 135

<210> 136 <210> 136

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 136

<400> 136

<210> 137 <210> 137

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 137

<400> 137

<210> 138 <210> 138

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 138

<400> 138

<210> 139 <210> 139

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 139

<400> 139

<210> 140 <210> 140

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 140

<400> 140

<210> 141 <210> 141

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 141

<400> 141

<210> 142 <210> 142

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 142

<400> 142

<210> 143 <210> 143

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 143

<400> 143

<210> 144 <210> 144

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 144

<400> 144

<210> 145 <210> 145

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 145

<400> 145

<210> 146 <210> 146

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 146

<400> 146

<210> 147 <210> 147

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 147

<400> 147

<210> 148 <210> 148

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 148

<400> 148

<210> 149 <210> 149

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 149

<400> 149

<210> 150 <210> 150

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 150

<400> 150

<210> 151 <210> 151

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 151

<400> 151

<210> 152 <210> 152

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 152

<400> 152

<210> 153 <210> 153

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 153

<400> 153

<210> 154 <210> 154

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 154

<400> 154

<210> 155 <210> 155

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 155

<400> 155

<210> 156 <210> 156

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 156

<400> 156

<210> 157 <210> 157

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 157

<400> 157

<210> 158 <210> 158

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 158

<400> 158

<210> 159 <210> 159

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 159

<400> 159

<210> 160 <210> 160

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 160

<400> 160

<210> 161 <210> 161

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 161

<400> 161

<210> 162 <210> 162

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 162

<400> 162

<210> 163 <210> 163

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 163

<400> 163

<210> 164 <210> 164

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 164

<400> 164

<210> 165 <210> 165

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 165

<400> 165

<210> 166 <210> 166

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 166

<400> 166

<210> 167 <210> 167

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 167

<400> 167

<210> 168 <210> 168

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 168

<400> 168

<210> 169 <210> 169

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 169

<400> 169

<210> 170 <210> 170

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 170

<400> 170

<210> 171 <210> 171

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 171

<400> 171

<210> 172 <210> 172

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 172

<400> 172

<210> 173 <210> 173

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 173

<400> 173

<210> 174 <210> 174

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 174

<400> 174

<210> 175 <210> 175

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 175

<400> 175

<210> 176 <210> 176

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 176

<400> 176

<210> 177 <210> 177

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 177

<400> 177

<210> 178 <210> 178

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 178

<400> 178

<210> 179 <210> 179

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 179

<400> 179

<210> 180 <210> 180

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 180

<400> 180

<210> 181 <210> 181

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 181

<400> 181

<210> 182 <210> 182

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 182

<400> 182

<210> 183 <210> 183

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 183

<400> 183

<210> 184 <210> 184

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 184

<400> 184

<210> 185 <210> 185

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 185

<400> 185

<210> 186 <210> 186

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 186

<400> 186

<210> 187 <210> 187

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 187

<400> 187

<210> 188 <210> 188

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 188

<400> 188

<210> 189 <210> 189

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 189

<400> 189

<210> 190 <210> 190

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 190

<400> 190

<210> 191 <210> 191

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 191

<400> 191

<210> 192 <210> 192

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 192

<400> 192

<210> 193 <210> 193

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 193

<400> 193

<210> 194 <210> 194

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 194

<400> 194

<210> 195 <210> 195

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 195

<400> 195

<210> 196 <210> 196

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 196

<400> 196

<210> 197 <210> 197

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 197

<400> 197

<210> 198 <210> 198

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 198

<400> 198

<210> 199 <210> 199

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 199

<400> 199

<210> 200 <210> 200

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 200

<400> 200

<210> 201 <210> 201

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 201

<400> 201

<210> 202 <210> 202

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 202

<400> 202

<210> 203 <210> 203

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 203

<400> 203

<210> 204 <210> 204

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 204

<400> 204

<210> 205 <210> 205

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 205

<400> 205

<210> 206 <210> 206

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 206

<400> 206

<210> 207 <210> 207

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 207

<400> 207

<210> 208 <210> 208

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 208

<400> 208

<210> 209 <210> 209

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 209

<400> 209

<210> 210 <210> 210

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 210

<400> 210

<210> 211 <210> 211

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 211

<400> 211

<210> 212 <210> 212

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 212

<400> 212

<210> 213 <210> 213

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 213

<400> 213

<210> 214 <210> 214

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 214

<400> 214

<210> 215 <210> 215

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 215

<400> 215

<210> 216 <210> 216

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 216

<400> 216

<210> 217 <210> 217

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 217

<400> 217

<210> 218 <210> 218

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 218

<400> 218

<210> 219 <210> 219

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 219

<400> 219

<210> 220 <210> 220

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 220

<400> 220

<210> 221 <210> 221

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 221

<400> 221

<210> 222 <210> 222

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 222

<400> 222

<210> 223 <210> 223

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 223

<400> 223

<210> 224 <210> 224

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 224

<400> 224

<210> 225 <210> 225

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 225

<400> 225

<210> 226 <210> 226

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 226

<400> 226

<210> 227 <210> 227

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 227

<400> 227

<210> 228 <210> 228

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 228

<400> 228

<210> 229 <210> 229

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 229

<400> 229

<210> 230 <210> 230

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 230

<400> 230

<210> 231 <210> 231

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 231

<400> 231

<210> 232 <210> 232

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 232

<400> 232

<210> 233 <210> 233

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 233

<400> 233

<210> 234 <210> 234

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 234

<400> 234

<210> 235 <210> 235

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 235

<400> 235

<210> 236 <210> 236

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 236

<400> 236

<210> 237 <210> 237

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 237

<400> 237

<210> 238 <210> 238

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 238

<400> 238

<210> 239 <210> 239

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 239

<400> 239

<210> 240 <210> 240

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 240

<400> 240

<210> 241 <210> 241

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 241

<400> 241

<210> 242 <210> 242

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 242

<400> 242

<210> 243 <210> 243

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 243

<400> 243

<210> 244 <210> 244

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 244

<400> 244

<210> 245 <210> 245

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 245

<400> 245

<210> 246 <210> 246

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 246

<400> 246

<210> 247 <210> 247

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 247

<400> 247

<210> 248 <210> 248

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 248

<400> 248

<210> 249 <210> 249

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 249

<400> 249

<210> 250 <210> 250

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 250

<400> 250

<210> 251 <210> 251

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 251

<400> 251

<210> 252 <210> 252

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 252

<400> 252

<210> 253 <210> 253

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 253

<400> 253

<210> 254 <210> 254

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 254

<400> 254

<210> 255 <210> 255

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 255

<400> 255

<210> 256 <210> 256

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 256

<400> 256

<210> 257 <210> 257

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 257

<400> 257

<210> 258 <210> 258

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 258

<400> 258

<210> 259 <210> 259

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 259

<400> 259

<210> 260 <210> 260

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 260

<400> 260

<210> 261 <210> 261

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 261

<400> 261

<210> 262 <210> 262

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 262

<400> 262

<210> 263 <210> 263

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 263

<400> 263

<210> 264 <210> 264

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 264

<400> 264

<210> 265 <210> 265

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 265

<400> 265

<210> 266 <210> 266

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 266

<400> 266

<210> 267 <210> 267

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 267

<400> 267

<210> 268 <210> 268

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 268

<400> 268

<210> 269 <210> 269

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 269

<400> 269

<210> 270 <210> 270

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<222> 合成構建體 <222> Synthetic construct

<400> 270

<400> 270

<210> 271 <210> 271

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 271

<400> 271

<210> 272 <210> 272

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 272

<400> 272

<210> 273 <210> 273

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 273

<400> 273

<210> 274 <210> 274

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 274

<400> 274

<210> 275 <210> 275

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 275

<400> 275

<210> 276 <210> 276

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 276

<400> 276

<210> 277 <210> 277

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 277

<400> 277

<210> 278 <210> 278

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 278

<400> 278

<210> 279 <210> 279

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 279

<400> 279

<210> 280 <210> 280

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 280

<400> 280

<210> 281 <210> 281

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 281

<400> 281

<210> 282 <210> 282

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 282

<400> 282

<210> 283 <210> 283

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 283

<400> 283

<210> 284 <210> 284

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 284

<400> 284

<210> 285 <210> 285

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 285

<400> 285

<210> 286 <210> 286

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 286

<400> 286

<210> 287 <210> 287

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 287

<400> 287

<210> 288 <210> 288

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 288

<400> 288

<210> 289 <210> 289

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 289

<400> 289

<210> 290 <210> 290

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 290

<400> 290

<210> 291 <210> 291

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 291

<400> 291

<210> 292 <210> 292

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 292

<400> 292

<210> 293 <210> 293

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 293

<400> 293

<210> 294 <210> 294

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 294

<400> 294

<210> 295 <210> 295

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 295

<400> 295

<210> 296 <210> 296

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 296

<400> 296

<210> 297 <210> 297

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 297

<400> 297

<210> 298 <210> 298

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 298

<400> 298

<210> 299 <210> 299

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 299

<400> 299

<210> 300 <210> 300

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 300

<400> 300

<210> 301 <210> 301

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 301

<400> 301

<210> 302 <210> 302

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 302

<400> 302

<210> 303 <210> 303

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 303

<400> 303

<210> 304 <210> 304

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 304

<400> 304

<210> 305 <210> 305

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 305

<400> 305

<210> 306 <210> 306

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 306

<400> 306

<210> 307 <210> 307

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 307

<400> 307

<210> 308 <210> 308

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 308

<400> 308

<210> 309 <210> 309

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 309

<400> 309

<210> 310 <210> 310

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 310

<400> 310

<210> 311 <210> 311

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 311

<400> 311

<210> 312 <210> 312

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 312

<400> 312

<210> 313 <210> 313

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 313

<400> 313

<210> 314 <210> 314

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 314

<400> 314

<210> 315 <210> 315

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 315

<400> 315

<210> 316 <210> 316

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 316

<400> 316

<210> 317 <210> 317

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 317

<400> 317

<210> 318 <210> 318

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 318

<400> 318

<210> 319 <210> 319

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 319

<400> 319

<210> 320 <210> 320

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 320

<400> 320

<210> 321 <210> 321

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 321

<400> 321

<210> 322 <210> 322

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 322

<400> 322

<210> 323 <210> 323

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 323

<400> 323

<210> 324 <210> 324

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 324

<400> 324

<210> 325 <210> 325

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 325

<400> 325

<210> 326 <210> 326

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 326

<400> 326

<210> 327 <210> 327

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 327

<400> 327

<210> 328 <210> 328

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 328

<400> 328

<210> 329 <210> 329

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 329

<400> 329

<210> 330 <210> 330

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 330

<400> 330

<210> 331 <210> 331

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 331

<400> 331

<210> 332 <210> 332

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 332

<400> 332

<210> 333 <210> 333

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 333

<400> 333

<210> 334 <210> 334

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 334

<400> 334

<210> 335 <210> 335

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 335

<400> 335

<210> 336 <210> 336

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 336

<400> 336

<210> 337 <210> 337

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 337

<400> 337

<210> 338 <210> 338

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 338

<400> 338

<210> 339 <210> 339

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 339

<400> 339

<210> 340 <210> 340

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 340

<400> 340

<210> 341 <210> 341

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 341

<400> 341

<210> 342 <210> 342

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 342

<400> 342

<210> 343 <210> 343

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 343

<400> 343

<210> 344 <210> 344

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 344

<400> 344

<210> 345 <210> 345

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 345

<400> 345

<210> 346 <210> 346

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 346

<400> 346

<210> 347 <210> 347

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 347

<400> 347

<210> 348 <210> 348

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 348

<400> 348

<210> 349 <210> 349

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 349

<400> 349

<210> 350 <210> 350

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 350

<400> 350

<210> 351 <210> 351

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 351

<400> 351

<210> 352 <210> 352

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 352

<400> 352

<210> 353 <210> 353

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 353

<400> 353

<210> 354 <210> 354

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 354

<400> 354

<210> 355 <210> 355

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 355

<400> 355

<210> 356 <210> 356

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 356

<400> 356

<210> 357 <210> 357

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 357

<400> 357

<210> 358 <210> 358

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 358

<400> 358

<210> 359 <210> 359

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 359

<400> 359

<210> 360 <210> 360

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 360

<400> 360

<210> 361 <210> 361

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 361

<400> 361

<210> 362 <210> 362

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 362

<400> 362

<210> 363 <210> 363

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 363

<400> 363

<210> 364 <210> 364

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 364

<400> 364

<210> 365 <210> 365

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 365

<400> 365

<210> 366 <210> 366

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 366

<400> 366

<210> 367 <210> 367

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 367

<400> 367

<210> 368 <210> 368

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 368

<400> 368

<210> 369 <210> 369

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 369

<400> 369

<210> 370 <210> 370

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 370

<400> 370

<210> 371 <210> 371

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 371

<400> 371

<210> 372 <210> 372

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 372

<400> 372

<210> 373 <210> 373

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 373

<400> 373

<210> 374 <210> 374

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 374

<400> 374

<210> 375 <210> 375

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 375

<400> 375

<210> 376 <210> 376

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 376

<400> 376

<210> 377 <210> 377

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 377

<400> 377

<210> 378 <210> 378

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 378

<400> 378

<210> 379 <210> 379

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 379

<400> 379

<210> 380 <210> 380

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 380

<400> 380

<210> 381 <210> 381

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 381

<400> 381

<210> 382 <210> 382

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 382

<400> 382

<210> 383 <210> 383

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 383

<400> 383

<210> 384 <210> 384

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 384

<400> 384

<210> 385 <210> 385

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 385

<400> 385

<210> 386 <210> 386

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 386

<400> 386

<210> 387 <210> 387

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 387

<400> 387

<210> 388 <210> 388

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 388

<400> 388

<210> 389 <210> 389

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 389

<400> 389

<210> 390 <210> 390

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 390

<400> 390

<210> 391 <210> 391

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 391

<400> 391

<210> 392 <210> 392

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 392

<400> 392

<210> 393 <210> 393

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 393

<400> 393

<210> 394 <210> 394

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 394

<400> 394

<210> 395 <210> 395

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 395

<400> 395

<210> 396 <210> 396

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 396

<400> 396

<210> 397 <210> 397

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 397

<400> 397

<210> 398 <210> 398

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 398

<400> 398

<210> 399 <210> 399

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 399

<400> 399

<210> 400 <210> 400

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 400

<400> 400

<210> 401 <210> 401

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 401

<400> 401

<210> 402 <210> 402

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 402

<400> 402

<210> 403 <210> 403

<211> 87 <211> 87

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 403

<400> 403

<210> 404 <210> 404

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 404

<400> 404

<210> 405 <210> 405

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 405

<400> 405

<210> 406 <210> 406

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 406

<400> 406

<210> 407 <210> 407

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 407

<400> 407

<210> 408 <210> 408

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 408

<400> 408

<210> 409 <210> 409

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 409

<400> 409

<210> 410 <210> 410

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 410

<400> 410

<210> 411 <210> 411

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 411

<400> 411

<210> 412 <210> 412

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 412

<400> 412

<210> 413 <210> 413

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 413

<400> 413

<210> 414 <210> 414

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 414

<400> 414

<210> 415 <210> 415

<211> 20 <211> 20

<212> DNA <212> DNA

<213> 人工序列 <213> Artificial sequence

<220> <220>

<223> 合成構建體 <223> Synthetic construct

<400> 415

<400> 415

Claims

A method for storing input data on a nucleic acid, characterized in that it includes: a) converting the foregoing input data into a nucleotide sequence group, wherein the foregoing conversion includes i) a data processing step, which includes converting the foregoing input data into two The carry string; ii) a nucleotide coding step, which includes converting the aforementioned binary string using a 5-bit transcoding framework to obtain the aforementioned nucleotide sequence group; and b) synthesizing the nucleic acid group including the aforementioned nucleotide sequence group.

A computer-implemented method for converting input data into a set of nucleotide sequences, characterized in that it includes: i) a data processing step, which includes converting the aforementioned input data into a binary string; ii) a nucleotide encoding step, which This involves converting the aforementioned binary string using a 5-bit transcoding framework to obtain a set of nucleotide sequences.

The method as described in item 1 or 2 of the patent application scope, wherein the aforementioned data processing step includes dividing the aforementioned binary string into a sequence of non-overlapping 5-bit binary strings.

The method as described in item 3 of the patent application scope, wherein the aforementioned nucleotide coding step includes converting each 5-bit binary string into an integer ranging from 0 to 31 to obtain an integer string.

The method as described in item 4 of the patent application scope, wherein the nucleotide encoding step further includes converting the integer string with the 5-bit transcoding framework to obtain the nucleotide sequence group.

The method as described in item 4 of the patent application scope, wherein the aforementioned nucleotide coding step further includes dividing the aforementioned integer string into a plurality of initial integer subsequences having a predetermined length.

The method as described in item 6 of the patent application scope, wherein the length of each of the plurality of initial integer subsequences is based on the length of the oligomer of the selected synthesis platform, the required error tolerance, and the size of the input data 3. Determine the selected error correction code or a combination thereof.

The method as described in item 6 or 7 of the patent application scope, wherein the nucleotide encoding step further includes adding index information to each of the plurality of initial integer subsequences to obtain a plurality of indexed integer subsequences.

The method as recited in item 8 of the patent application scope, wherein the index information added to each of the plurality of initial integer subsequences includes an integer sequence, wherein the length of the integer sequence is based on the size of the input data.

The method as described in item 8 or 9 of the patent application, wherein the nucleotide coding step includes, after adding the index information, adding redundant data to the multiple indexed integer subsequences, thereby obtaining multiple With redundant integer subsequences.

The method as described in item 10 of the patent application scope, wherein adding redundant data to the aforementioned multiple indexed integer subsequences includes: creating an empty matrix, wherein the number of columns in the aforementioned empty matrix is greater than the aforementioned multiple indexed integers The size of the subsequence, and wherein the number of rows in the empty matrix is greater than the number of integers in each of the plurality of indexed integer subsequences; encoding with the plurality of indexed integer subsequences and by applying error correction The generated data fills the aforementioned empty matrix; and obtains the aforementioned multiple integer subsequences with redundancy based on the filled matrix.

The method as described in item 11 of the patent application range, wherein the number of columns of the empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the plurality of indexed The size of the integer subsequence or a combination thereof is determined.

The method as described in item 11 or 12 of the patent application scope, wherein the number of rows of the aforementioned empty matrix is based on the length of the oligomer of the selected synthesis platform, the type of error correction code, the predetermined error tolerance value, the plurality of The size of the indexed integer subsequence or a combination thereof is determined.

The method as described in any of items 11-13 of the patent application scope, wherein the aforementioned error correction code is a Reed-Solomon ("RS") code.

The method described in item 14 of the patent application scope, wherein the data generated by applying the error correction coding is generated by applying the string correction of the RS coding and/or the block correction of the RS coding.

The method as described in any of items 1-15 of the patent application scope, wherein the aforementioned 5-bit transcoding framework is based on Table 2.

The method as described in item 16 of the patent application scope, wherein the selection of R and Y is based on: 1) different from the nucleotide immediately before R or Y; and/or 2) the estimated GC content of the aforementioned nucleotide sequence .

The method as described in any of items 1-17 of the patent application scope, wherein the aforementioned input data corresponds to a compressed file.

The method as described in any of items 1 to 18 of the patent application scope, wherein the aforementioned input data corresponds to two or more files.

The method as described in any of items 1-17 or 19 of the patent application scope, wherein the aforementioned input data corresponds to a text file.

The method as described in any one of items 1-20 of the patent application scope, wherein the aforementioned data processing step further includes compressing the aforementioned input data to obtain a compressed file and converting the aforementioned compressed file into a binary string.

The method as described in item 18 or 21 of the patent application scope, wherein the aforementioned compression file is compressed using the Lampel-Zico-Marykov Chain Algorithm ("LZMA").

As described in Item 19 of the patent application scope, the aforementioned data processing step further includes: grouping two or more files into TAR files.

The method as described in item 23 of the patent application scope, in which the aforementioned TAR file is further compressed using the Lampel-Zico-Markov Chain Algorithm ("LZMA").

The method as described in any one of claims 1-24, wherein the aforementioned nucleotide coding step further includes appending a primer sequence pair to the 5'of each nucleotide sequence of the aforementioned nucleotide sequence group And the 3'end.

The method as described in item 1 of the patent application scope, which further includes attaching a primer pair to the synthetic nucleic acid group.

A method for storing two or more sets of input data on a nucleic acid, the characteristics of which include: a) As described in any one of items 2-19 of the patent application scope, input the two or more sets of the foregoing Convert the data into two or more sets of corresponding nucleotide sequences; b) Attach the primer sequence pairs to the 5'and 3'of each of the two or more sets of corresponding nucleotide sequences respectively Ends, where the primer pairs used for the aforementioned two or more sets of corresponding nucleotide sequences are different from each other; and c) synthesize two or more sets including the aforementioned two or more sets of corresponding nucleotide sequences, respectively Nucleic acid.

The method as described in item 27 of the patent application range, wherein each pair of primers has a sequence different from any one of the aforementioned two or more sets of corresponding nucleotide sequences or complementary sequences thereof.

The method as described in any of items 1 or 3-28 of the patent application scope, wherein the GC content of the synthetic nucleic acid group ranges from 30% to 70%.

The method as described in any of items 1 or 3-29 of the patent application scope, wherein the GC content of the synthetic nucleic acid group is less than about 70%.

The method as described in item 1 of the patent application scope, which further includes storing the synthetic nucleic acid group.

The method as described in item 31 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is stored by drying.

The method as described in item 32 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is stored by lyophilization.

The method as described in item 31 of the patent application scope, wherein the aforementioned synthetic nucleic acid group is fixed on a carrier.

The method described in item 34 of the patent application scope, wherein the aforementioned carrier is a microarray.

A method for retrieving output data stored on a nucleic acid, characterized by: a) obtaining a nucleotide sequence group of a nucleic acid group, b) converting the aforementioned nucleotide sequence group into the aforementioned output data, wherein the aforementioned conversion Including: i) nucleic acid decoding step, including converting the aforementioned nucleotide sequence group into a binary string using a 5-bit transcoding framework; and ii) data processing step, including converting the binary string into the aforementioned output data, thereby obtaining the aforementioned output data.

The method as described in item 36 of the patent application scope, wherein the foregoing method includes amplifying the nucleic acid group before retrieving the output data.

The method as described in any one of the patent application items 36-37, further comprising sequencing the aforementioned nucleic acid group to generate multiple sequence reads.

The method as described in item 38 of the patent application scope, wherein the plurality of sequence reads are paired, combined and filtered to obtain the aforementioned nucleotide sequence group.

A computer-implemented method for converting a nucleotide sequence group into output data, which includes: i) a nucleotide decoding step, including using a 5-bit transcoding framework to convert the aforementioned nucleotide sequence group into a binary string ; And ii) data processing steps, including converting the binary string into the aforementioned output data.

The method as described in any one of patent application items 36-40, wherein the nucleotide decoding step includes converting the nucleotide sequence group into multiple integer subsequences including integers in the range of 0-31 .

The method as described in item 41 of the patent application range, wherein the nucleotide decoding step further includes applying an error correction code to the plurality of integer subsequences, thereby obtaining the plurality of indexed integer subsequences.

The method as described in item 42 of the patent application range, wherein the step of applying error correction coding includes: i) applying RS code string correction to the foregoing multiple integer subsequences to obtain multiple uniform integer subsequences; and ii) The RS coding block correction is applied to the aforementioned plurality of uniform integer subsequences to obtain the aforementioned plurality of indexed integer subsequences.

The method as recited in item 42 or 43 of the patent application range, wherein the nucleotide decoding step further includes removing the index from the plurality of indexed integer subsequences to obtain a plurality of core integer subsequences.

The method as described in item 44 of the patent application range, wherein the aforementioned nucleotide decoding step further includes combining the aforementioned core integer subsequences into an integer string.

The method as described in item 45 of the patent application scope, wherein the nucleotide decoding step further includes converting the integer string into a binary string.

The method as described in item 46 of the patent application scope, in which the aforementioned output data is stored in a compressed file.

The method as described in item 47 of the patent application scope, wherein the aforementioned data processing step further includes decompressing the aforementioned compressed file.

The method as described in item 48 of the patent application scope, wherein the decompression is performed through the LZMA algorithm.

The method as described in item 46 of the patent application scope, wherein the aforementioned output data corresponds to multiple files.

The method as described in item 50 of the patent application scope further includes extracting the plurality of files from the output data through the TAR algorithm.

The method as described in any of items 36-51 of the patent application range, wherein the aforementioned 5-bit transcoding framework is based on Table 2.

The method as described in any one of patent application items 36-53, wherein the nucleic acid set includes primer sequences at the 3'and 5'ends and the method includes removing the primer sequence before the nucleotide decoding step .

A method for retrieving output data stored on a nucleic acid group of interest, characterized in that the aforementioned nucleic acid group of interest is one of multiple sets of nucleotide sequences present in the mixture, each group encoding Different output data sets with different primer pair sets at the 3'and 5'ends, the aforementioned method includes: a) amplifying the aforementioned nucleic acid set using primer pairs corresponding to the nucleic acid of interest; b) obtaining the amplified nucleic acid Nucleotide sequence group, c) converting the aforementioned nucleotide sequence group into the aforementioned output data according to the method described in any of items 40-52 of the patent application scope; thereby obtaining the aforementioned output data.

A method for retrieving corresponding two or more sets of output data stored on two or more sets of nucleic acids of interest, characterized by the fact that the aforementioned two or more sets of nucleic acids of interest exist Among the multiple nucleotide sequences in the mixture, each set encodes a different set of output data and has different sets of primer pairs at the 3'and 5'ends. The foregoing methods include: a) using the corresponding two sets Or more pairs of primers of the nucleic acid of interest to amplify the aforementioned two or more sets of nucleic acids of interest; b) obtain two or more sets of nucleotide sequences of the amplified nucleic acid, c) if the patent application The method described in any one of items 40-52 converts the aforementioned two or more sets of nucleotide sequences into the aforementioned two or more sets of output data; thereby obtaining the aforementioned two or more sets of output data.

A non-transitory computer-readable storage medium storing one or more programs, characterized in that the aforementioned one or more programs include instructions that when executed by one or more processors of an electronic device, cause the aforementioned The electronic device implements the method described in any one of the patent application items 2-36 or 40-52.

A system for providing nucleic acid-based data storage or retrieving data from nucleic acid, characterized in that it includes: one or more processors; memory; and one or more programs, wherein the aforementioned one or more programs Stored in the aforementioned memory and configured to be executed by the aforementioned one or more processors, the aforementioned one or more programs including are used to implement as described in any one of patent application scope 2-36 or 40-52 Method instructions.

An electronic device for providing nucleic acid-based data storage or retrieving data from a nucleic acid, characterized in that the aforementioned device includes a device for implementing as described in any one of patent application items 2-36 or 40-52 Method equipment.