KR20240124818A

KR20240124818A - Design platform of bi-specific nucleic acid molecules

Info

Publication number: KR20240124818A
Application number: KR1020240018580A
Authority: KR
Inventors: 이대훈; 최진우; 정재균; 계민정; 유중기; 엄기환
Original assignee: 경희대학교 산학협력단; (주)큐리진
Priority date: 2023-02-09
Filing date: 2024-02-07
Publication date: 2024-08-19
Also published as: WO2024167299A1

Abstract

본 발명은 2 종의 유전자를 표적화하여 동시에 발현을 억제하는 이중 타겟 핵산 분자를 설계하기 위한 플랫폼에 관한 것으로, 본 발명에서는 이중 타겟 핵산 분자의 특정 뉴클레오타이드 위치에 미스매치를 도입함으로써 종래의 이중 타겟 핵산 분자들의 오프 타겟 문제점과 유전자 발현 억제 효과의 편향성을 개선하였으므로, 이를 이중 타겟 핵산 분자의 설계에 활용할 수 있다.The present invention relates to a platform for designing a dual target nucleic acid molecule that targets two genes and simultaneously suppresses their expression. In the present invention, by introducing mismatches at specific nucleotide positions of the dual target nucleic acid molecule, the off-target problem and the bias of the gene expression suppression effect of conventional dual target nucleic acid molecules are improved, and therefore, this can be utilized in the design of a dual target nucleic acid molecule.

Description

{Design platform of bi-specific nucleic acid molecules}

본 발명은 2 종의 유전자를 표적화하여 동시에 발현을 억제하는 이중 타겟 핵산 분자를 설계하기 위한 플랫폼에 관한 것이다.The present invention relates to a platform for designing dual target nucleic acid molecules that simultaneously target and suppress the expression of two genes.

유전자의 발현을 억제하는 기술은 질병치료를 위한 치료제 개발 및 표적 검증에서 중요한 도구이다. 간섭 RNA(RNA interference, 이하 RNAi라고 한다)는 그 역할이 발견된 이후로, 다양한 종류의 포유동물세포(mammalian cell)에서 서열 특이적 mRNA에 작용한다는 사실이 밝혀졌다 (Silence of the transcripts: RNA interference in medicine. J Mol Med (2005) 83: 764773). RNAi는 21-25개의 뉴클레오타이드 크기의 이중나선 구조를 가진 작은 간섭 리보핵산 짧은 간섭 RNA (small interfering RNA, 이하 siRNA라고 한다)이 상보적인 서열을 가지는 전사체(mRNA transcript)에 특이적으로 결합하여 해당 전사체를 분해함으로써 특정 단백질의 발현을 억제하는 현상이다. 세포 내에서는 RNA 이중가닥이 Dicer라는 엔도뉴클라아제(endonuclease)에 의해 프로세싱되어 21 내지 23개의 이중가닥(base pair,bp)의 siRNA로 변환되며, siRNA 는 RISC(RNA-induced silencing complex)에 결합하여 가이드(안티센스) 가닥이 타겟 mRNA를 인식하여 분해하는 과정을 통해 타겟 유전자의 발현을 서열 특이적으로 저해한다 (NUCLEIC-ACID THERAPEUTICS: BASIC PRINCIPLES AND RECENT APPLICATIONS. Nature Reviews Drug Discovery. 2002. 1, 503-514). 베르트랑(Bertrand) 연구진에 따르면 동일한 타겟 유전자에 대한 siRNA가 안티센스 올리고뉴클레오티드(Antisense oligonucleotide, ASO)에 비하여 생체 내/외(in vitro 및 in vivo)에서 mRNA 발현의 저해효과가 뛰어나고, 해당 효과가 오랫동안 지속되는 효과를 포함하는 것으로 밝혀졌다 (Comparison of antisense oligonucleotides and siRNAs in cell culture and in vivo. Biochem. Biophys. Res.Commun. 2002. 296: 1000-1004). siRNA를 포함하는 RNAi 기술 기반 치료제 시장은 향후 세계 시장규모가 2020년경에 총 12조원 이상을 형성하는 것으로 분석되었으며, 해당 기술을 적용할 수 있는 대상이 획기적으로 확대되어 기존의 항체, 화합물 기반 의약품으로 치료하기 어려운 질병을 치료할 수 있는 차세대 유전자 치료기술로 평가되고 있다. 또한 siRNA의 작용 기작은 타겟 mRNA와 상보적으로 결합하여 서열 특이적으로 타겟 유전자의 발현을 조절하기 때문에, 기존의 항체 기반 의약품이나 화학물질(small molecule drug)이 특정한 단백질 표적에 최적화되기까지 오랜 동안의 개발 기간 및 개발 비용이 소요되는 것에 비하여, 적용할 수 있는 대상이 획기적으로 확대될 수 있고, 개발 기간이 단축되면서, 의약화가 불가능한 표적 물질을 포함한 모든 단백질 표적에 대하여 최적화된 리드 화합물을 개발할 수 있다는 장점을 가진다 (Progress Towards in Vivo Use of siRNAs. MOLECULAR THERAPY. 2006 13(4):664-670). 이에, 최근 이 리보핵산 매개 간섭현상이 기존의 화학 합성 의약 개발에서 발생되는 문제의 해결책을 제시하면서 전사체 수준에서 특정 단백질의 발현을 선택적으로 억제하여 각종 질병 치료제, 특히 종양 치료제 개발에 이용하려는 연구가 진행되고 있다. 또한, siRNA 치료제는 기존 항암제와 달리 표적이 명확하여 부작용이 예측 가능하다는 장점이 있으나, 이러한 표적 특이성은 다양한 유전자의 문제에 의해 발생하는 질병인 종양의 경우, 오히려 이러한 표적 특이성은 치료 효과가 높지 않은 원인이 되기도 한다. 또한, 종양의 경우, 하나의 유전자를 제어한다고 암을 치료할 수 있는 것은 아니고, 또한 항암제 내성등이 잘 발생한다. 따라서, 유전자 치료제로 암을 치료하는 경우에 하나의 유전자를 표적화하는 것만으로는 암을 제어하기 어렵다. 이러한 문제점을 해결하고자, 여러 유전자에 대한 siRNA를 제작하여 각각을 도입하는 경우에는 벡터로 전달할 수 있는 갯수의 한계 및 오프타겟 효과가 증가되어 목적하려는 효과를 달성하기 어렵다. 따라서, 하나의 핵산서열이 목적하려는 질환과 관련한 여러 유전자를 동시에 타겟할 수 있도록 서열 설계가 필요하나, 서열의 다양성, 오프 타겟의 존재 가능성 등으로 이중 타겟 핵산 분자를 설계하는 것은 매우 어려운 실정이다. Technology that suppresses gene expression is an important tool in the development of therapeutic agents and target verification for disease treatment. Since its role was discovered, RNA interference (RNAi) has been found to act on sequence-specific mRNA in various types of mammalian cells (Silence of the transcripts: RNA interference in medicine. J Mol Med (2005) 83: 764-773). RNAi is a phenomenon in which short interfering RNA (siRNA), a double-helix structured small interfering ribonucleic acid of 21-25 nucleotides in size, specifically binds to a transcript (mRNA transcript) with a complementary sequence and degrades the transcript, thereby suppressing the expression of a specific protein. Inside the cell, RNA duplexes are processed by an endonuclease called Dicer and converted into 21 to 23-base pair (bp) siRNA, which binds to RISC (RNA-induced silencing complex) and sequence-specifically inhibits the expression of target genes through a process in which the guide (antisense) strand recognizes and degrades the target mRNA (NUCLEIC-ACID THERAPEUTICS: BASIC PRINCIPLES AND RECENT APPLICATIONS. Nature Reviews Drug Discovery. 2002. 1, 503-514). According to Bertrand's research team, siRNA for the same target gene has a more excellent effect of inhibiting mRNA expression in vitro and in vivo than antisense oligonucleotides (ASO), and the effect lasts a long time (Comparison of antisense oligonucleotides and siRNAs in cell culture and in vivo. Biochem. Biophys. Res. Commun. 2002. 296: 1000-1004). The RNAi technology-based therapeutics market including siRNA is expected to form a global market size of more than 12 trillion won by 2020, and the target to which the technology can be applied has expanded dramatically, so it is evaluated as a next-generation gene therapy technology that can treat diseases that are difficult to treat with existing antibody and compound-based drugs. In addition, since the mechanism of action of siRNA is to complementarily bind to the target mRNA and regulate the expression of the target gene in a sequence-specific manner, compared to the long development period and development cost required for existing antibody-based drugs or chemical substances (small molecule drugs) until they are optimized for a specific protein target, the applicable targets can be dramatically expanded, the development period is shortened, and lead compounds optimized for all protein targets, including target substances that cannot be pharmaceutically viable, have the advantage (Progress Towards in Vivo Use of siRNAs. MOLECULAR THERAPY. 2006 13(4):664-670). Accordingly, research is being conducted to utilize this ribonucleic acid-mediated interference phenomenon to develop various disease treatments, especially tumor treatments, by selectively inhibiting the expression of specific proteins at the transcriptome level, as it suggests a solution to the problems occurring in the development of existing chemical synthetic drugs. In addition, siRNA therapeutics have the advantage of having a clear target and predictable side effects, unlike existing anticancer drugs. However, this target specificity can be a cause of low therapeutic efficacy in the case of tumors, which are diseases caused by problems with various genes. In addition, in the case of tumors, cancer cannot be treated by controlling a single gene, and anticancer drug resistance often occurs. Therefore, it is difficult to control cancer by targeting only a single gene when treating cancer with gene therapy. To solve this problem, if siRNAs for multiple genes are produced and introduced individually, the number that can be delivered as a vector is limited and the off-target effect increases, making it difficult to achieve the intended effect. Therefore, although sequence design is necessary so that a single nucleic acid sequence can simultaneously target multiple genes related to the intended disease, it is very difficult to design dual-target nucleic acid molecules due to sequence diversity, the possibility of off-targets, etc.

현재 RNAi 도구로서 코딩된 바이-시스트론 서열(bi-cistronic sequence)로부터 생성된 shRNA 또는 miRNA를 두 가지 유전자의 억제에 이용되고 있으나, 많은 경우에 상대적으로 긴 전사본을 생성하여 원하는 구조를 형성하지 못하는 문제점이 있어왔다. 또한, 다중 프로모터를 사용하여 독립된 프로모터 하에 별개의 shRNA를 코딩하는 방법이 있으나, 이는 표적화된 두 종류의 유전자 중 한 종류의 유전자만 하향 조절되는 등 shRNA의 편향된 생성이 발생하는 문제점이 있다. 따라서, 서로 다른 두 유전자의 불균형한 하향 조절로 인한 오프 타겟 문제점을 해결한 차세대 이중 표적화 방법의 개발이 필요하다.Currently, shRNA or miRNA generated from bicistronic sequences encoded as RNAi tools are used to suppress two genes, but in many cases, there has been a problem in that relatively long transcripts are generated and the desired structure is not formed. In addition, there is a method of using multiple promoters to encode separate shRNAs under independent promoters, but this has a problem in that biased production of shRNAs occurs, such that only one of the two targeted genes is down-regulated. Therefore, it is necessary to develop a next-generation dual targeting method that solves the off-target problem caused by the unbalanced down-regulation of two different genes.

현재, 이중 특이적 핵산분자를 설계하는 방법으로는 대한민국 출원 10-2019-0160720에서 개시한 방법이 있다. 상기 방법은 각기 다른 유전자의 mRNA를 타겟하는 핵산분자를 설계할 수 있으나, 1)특정 유전자의 DNA 서열을 특정 간격으로 세그멘트화하여 서열정보를 획득하여 이중 타겟 핵산 분자를 설계하는데, 상기 세그멘트화 과정에서 공백영역이 도출될 수 있고, DNA 서열의 경우 인트론 영역이 존재하여, 설계 효율이 낮을 수 있는 문제점이 있다. 또한, 2)이중 특이적 핵산 분자 각각 서열의 경우, 미스매칭을 허용하는 경우가 대부분이나, 이러한 미스매칭의 위치에 따라 오프타겟 효과 및 억제효율이 현저히 차이가 있어서, 이중 특이적 핵산분자를 설계하였다고 하여도, 설계된 이중 특이적 핵산분자는 오프타겟 효과가 존재하거나, 억제효율이 낮을 수 있는 문제점이 있다.Currently, as a method for designing a dual-specific nucleic acid molecule, there is a method disclosed in Republic of Korea Application No. 10-2019-0160720. The method can design a nucleic acid molecule targeting the mRNA of different genes, but 1) there is a problem that a blank region may be derived during the segmentation process by segmenting the DNA sequence of a specific gene at a specific interval to obtain sequence information and that the design efficiency may be low due to the presence of an intron region in the DNA sequence. In addition, 2) in the case of each sequence of a dual-specific nucleic acid molecule, mismatching is mostly allowed, but the off-target effect and inhibition efficiency differ significantly depending on the location of the mismatch, so even if a dual-specific nucleic acid molecule is designed, there is a problem that the designed dual-specific nucleic acid molecule may have an off-target effect or a low inhibition efficiency.

이에 본 발명의 발명자들은 상기의 문제점을 해결하고자, 1)이중 특이적 핵산분자를 설계시 유전자의 DNA 서열이 아닌 mRNA 서열을 이용하여 이중 특이적 핵산분자 서열의 설계 효율을 높였으며, 2)이중 특이적 핵산분자의 효과를 극대화할 수 있는 미스매칭 위치를 도출함으로써, 본 발명을 완성하게 되었다. Accordingly, the inventors of the present invention, in order to solve the above problems, 1) increased the efficiency of designing a dual-specific nucleic acid molecule sequence by using an mRNA sequence rather than a DNA sequence of a gene when designing a dual-specific nucleic acid molecule, and 2) completed the present invention by deriving a mismatch position capable of maximizing the effect of a dual-specific nucleic acid molecule.

따라서, 본 발명의 목적은 특정 위치의 미스매치를 포함하는 이중 타겟 핵산분자를 제공하는 것이다.Accordingly, it is an object of the present invention to provide a dual target nucleic acid molecule comprising a mismatch at a specific position.

아울러, 본 발명의 목적은 이중 타겟 핵산 분자의 설계 방법을 제공하는 것이다.In addition, it is an object of the present invention to provide a method for designing a dual target nucleic acid molecule.

상기 목적의 달성을 위해, 본 발명은 특정 뉴클레오타이드 위치에 미스매치가 포함된 이중 타겟 핵산분자를 제공한다.To achieve the above purpose, the present invention provides a dual target nucleic acid molecule containing a mismatch at a specific nucleotide position.

아울러, 본 발명은 오프 타겟 효과가 없는 이중 타겟 핵산 분자의 설계 방법을 제공한다.In addition, the present invention provides a method for designing a dual target nucleic acid molecule without off-target effects.

본 발명에서는 이중 타겟 핵산 분자의 특정 뉴클레오타이드 위치에 미스매치를 도입함으로써 종래의 이중 타겟 핵산 분자들의 오프 타겟 문제점과 유전자 발현 억제 효과의 편향성을 개선하였으므로, 이를 이중 타겟 핵산 분자의 설계에 활용할 수 있다.In the present invention, the off-target problem and the bias of the gene expression inhibition effect of conventional dual target nucleic acid molecules are improved by introducing a mismatch at a specific nucleotide position of the dual target nucleic acid molecule, and thus this can be utilized in the design of dual target nucleic acid molecules.

도 1은 이중 타겟 siRNA 후보들을 도출하는 알고리즘의 원리를 나타낸 도이다.
도 2는 종래에 설계된 이중 타겟 siRNA 쌍과 본 발명에서 미스매치(mismatch)를 포함하도록 설계된 이중 타겟 siRNA 쌍의 차이를 나타낸 도이다.
도 3은 설계된 siRNA 쌍의 미스매치 위치와 siRNA 쌍의 각 가닥의 해당 유전자 발현 억제 효율을 분석한 도이다:
a: 전체 서열 길이 1에서 siRNA 쌍에서의 미스매치의 위치; 및
b: 미스매치 위치 및 유전자 발현 억제 효율 (KD_eff).
도 4는 이중 타겟 siRNA에서 유전자 발현 억제 효율이 최적화된 미스매치의 존재 위치를 한 가닥의 siRNA 기준에서 나타낸 도이다.
도 5는 미스매치 위치 및 구조에 따른 siRNA의 유전자 발현 억제 효율 선택성을 확인한 도이다.
도 6는 일 실시예에 따른 이중 타겟 핵산분자를 설계하는 장치(200)의 블록도이다.Figure 1 is a diagram illustrating the principle of the algorithm for deriving dual target siRNA candidates.
Figure 2 is a diagram showing the difference between a conventionally designed dual target siRNA pair and a dual target siRNA pair designed to include a mismatch in the present invention.
Figure 3 is a diagram analyzing the mismatch positions of the designed siRNA pair and the corresponding gene expression inhibition efficiency of each strand of the siRNA pair:
a: Position of mismatch in siRNA pair at full sequence length 1; and
b: Mismatch location and gene expression inhibition efficiency (KD _eff ).
Figure 4 is a diagram showing the location of mismatches for optimized gene expression suppression efficiency in dual target siRNA, based on a single-strand siRNA.
Figure 5 is a diagram confirming the selectivity of siRNA gene expression inhibition efficiency according to mismatch position and structure.
FIG. 6 is a block diagram of a device (200) for designing a dual target nucleic acid molecule according to one embodiment.

이하, 본 발명의 구현예로 본 발명을 상세히 설명하기로 한다. 다만, 하기 구현예는 본 발명에 대한 예시로 제시되는 것으로, 이에 의해 본 발명이 제한되지는 않으며 본 발명은 후술하는 특허청구범위의 기재 및 그로부터 해석되는 균등 범주 내에서 다양한 변형 및 응용이 가능하다. Hereinafter, the present invention will be described in detail with examples of the present invention. However, the following examples of the present invention are presented as examples of the present invention, and the present invention is not limited thereby, and the present invention can be variously modified and applied within the scope of the following claims and equivalents interpreted therefrom.

달리 지시되지 않는 한, 핵산은 좌측에서 우측으로 5'→3' 방향으로 기록된다. 명세서 내에서 열거된 수치 범위는 범위를 정의하는 숫자를 포함하고, 정의된 범위 내의 각각의 정수 또는 임의의 비-정수 분획을 포함한다.Unless otherwise indicated, nucleic acids are written in a 5' to 3' orientation from left to right. Numerical ranges recited within the specification are inclusive of the numbers defining the range and include each integer or any non-integer fraction within the defined range.

달리 정의되지 않는 한, 본원에서 사용된 모든 기술적 및 과학적 용어는 본 발명이 속하는 분야의 당업자가 통상적으로 이해하는 것과 동일한 의미를 갖는다. 본원에 기술된 것들과 유사하거나 등가인 임의의 방법 및 재료가 본 발명을 테스트하기 위한 실행에서 사용될 수 있지만, 바람직한 재료 및 방법이 본원에서 기술된다.Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice of testing the present invention, the preferred materials and methods are described herein.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 해당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention are selected from the most widely used general terms possible while considering the functions of the present invention, but this may vary depending on the intention of engineers working in the relevant field, precedents, the emergence of new technologies, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meanings thereof will be described in detail in the description of the relevant invention. Therefore, the terms used in the present invention should be defined based on the meanings of the terms and the overall contents of the present invention, rather than simply the names of the terms.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When a part of the specification is said to "include" a component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated. In addition, terms such as "part", "module", etc., used in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software, or a combination of hardware and software.

일 측면에서, 본 발명은 1) 제 1 유전자의 mRNA를 타겟하는 제 1 핵산 분자; 및 2) 제 2 유전자의 mRNA를 타겟하는 제 2 핵산 분자를 포함하는 이중 타겟 핵산 분자에 관한 것으로, 상기 제 1 핵산 분자 및 제 2 핵산 분자는 부분적으로 상보적이고, 및 제 1 핵산 분자 또는 제 2 핵산 분자의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해 제 1 핵산 분자 또는 제 2 핵산 분자의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.40에 해당하는 위치 또는 0.60 내지 0.75에 해당하는 위치에서 제 1 핵산 분자와 제 2 핵산 분자의 염기쌍이 미스매치(mismatch)된 것을 특징으로 하는, 이중 타겟 핵산분자에 관한 것이다.In one aspect, the present invention relates to a dual target nucleic acid molecule comprising 1) a first nucleic acid molecule targeting mRNA of a first gene; and 2) a second nucleic acid molecule targeting mRNA of a second gene, wherein the first nucleic acid molecule and the second nucleic acid molecule are partially complementary, and a base pair of the first nucleic acid molecule and the second nucleic acid molecule is mismatched at a position corresponding to 0.25 to 0.40, or a position corresponding to 0.60 to 0.75 from the 5' end or the 3' end of the first nucleic acid molecule or the second nucleic acid molecule with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule.

본 발명의 일 구현예에서, 제 1 핵산 분자 또는 제 2 핵산 분자의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해 제 1 핵산 분자 또는 제 2 핵산 분자의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.30에 해당하는 위치 또는 0.70 내지 0.75에 해당하는 위치에서 제 1 핵산 분자와 제 2 핵산 분자의 염기쌍이 미스매치(mismatch)된 것일 수 있다.In one embodiment of the present invention, a base pair between the first nucleic acid molecule and the second nucleic acid molecule may be mismatched at a position corresponding to 0.25 to 0.30, or a position corresponding to 0.70 to 0.75 from the 5' end or the 3' end of the first nucleic acid molecule or the second nucleic acid molecule with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule.

일 구현예에서, 상기 제 1 핵산 분자는 제 1 유전자의 mRNA에 100% 상보적인 서열을 갖고, 상기 제 2 핵산 분자는 제 2 유전자의 mRNA에 100% 상보적인 서열을 가질 수 있다.In one embodiment, the first nucleic acid molecule can have a sequence that is 100% complementary to an mRNA of the first gene, and the second nucleic acid molecule can have a sequence that is 100% complementary to an mRNA of the second gene.

일 구현예에서, 제 1 핵산 분자와 제 2 핵산 분자는 1 내지 3개의 염기쌍이 미스매치될 수 있다.In one embodiment, the first nucleic acid molecule and the second nucleic acid molecule may be mismatched by 1 to 3 base pairs.

일 구현예에서, 제 1 핵산 분자 또는 제 2 핵산 분자는 14 내지 30mer일 수 있고, 17 내지 24mer인 것이 더욱 바람직하다.In one embodiment, the first nucleic acid molecule or the second nucleic acid molecule can be 14 to 30 mer, more preferably 17 to 24 mer.

일 구현예에서, 상기 이중 타겟 핵산 분자는 이중 가닥(double strand, ds) siRNA(small interfering RNA) 또는 shRNA일 수 있다.In one embodiment, the dual target nucleic acid molecule can be a double stranded (ds) small interfering RNA (siRNA) or shRNA.

일 구현예에서, 상기 이중 타겟 핵산 분자가 20mer인 경우, 1) 제 1 핵산 분자의 5' 말단으로부터 7번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 14번째 위치의 염기가 미스매치되고; 2) 제 1 핵산 분자의 5' 말단으로부터 9번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 12번째 위치의 염기가 미스매치되며; 및 3) 제 1 핵산 분자의 5' 말단으로부터 14번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 7번째 위치의 염기가 미스매치될 수 있다.In one embodiment, when the dual target nucleic acid molecule is 20mer, 1) the base at the 7th position from the 5' end of the first nucleic acid molecule and the base at the 14th position from the 5' end of the second nucleic acid molecule may be mismatched; 2) the base at the 9th position from the 5' end of the first nucleic acid molecule and the base at the 12th position from the 5' end of the second nucleic acid molecule may be mismatched; and 3) the base at the 14th position from the 5' end of the first nucleic acid molecule and the base at the 7th position from the 5' end of the second nucleic acid molecule may be mismatched.

일 구현예에서, 상기 핵산 분자는 제 1 유전자의 mRNA를 타겟하는 제 1 핵산 분자인 단일 가닥 siRNA 및 제 2 유전자의 mRNA를 타겟하는 제 2 핵산 분자인 단일 가닥 siRNA가 부분적으로 상보적 결합을 이루고 있는 이중 가닥 siRNA일 수 있으며, 이중 가닥(double strand, ds)으로 이루어진 siRNA의 센스 가닥 (guide RNA)이 제 1 유전자의 mRNA에 상보적으로 결합하는 siRNA이고, siRNA의 안티센스 가닥 (passenger RNA)이 제 2 유전자의 mRNA에 상보적으로 결합하는 siRNA일 수 있다.In one embodiment, the nucleic acid molecule may be a double-stranded siRNA in which a first nucleic acid molecule, a single-stranded siRNA, targeting the mRNA of the first gene, and a second nucleic acid molecule, a single-stranded siRNA, targeting the mRNA of the second gene, partially complementarily bind to each other, and the sense strand (guide RNA) of the siRNA consisting of a double strand (ds) may be an siRNA that complementarily binds to the mRNA of the first gene, and the antisense strand (passenger RNA) of the siRNA may be an siRNA that complementarily binds to the mRNA of the second gene.

일 구현예에서, 제 1 핵산 분자는 RNA 간섭에 의해 제 1 유전자 발현을 억제할 수 있고, 제 2 핵산 분자는 RNA 간섭에 의해 제 2 유전자 발현을 억제할 수 있어, 본 발명의 핵산 분자는 2종의 유전자의 발현을 동시에 억제할 수 있다. In one embodiment, the first nucleic acid molecule can inhibit expression of a first gene by RNA interference, and the second nucleic acid molecule can inhibit expression of a second gene by RNA interference, so that the nucleic acid molecules of the present invention can simultaneously inhibit expression of two genes.

일 구현예에서, 상기 siRNA의 핵산은 하나 이상의 뉴클레오타이드의 당 또는 백본(backbone)이 변형된 뉴클레오타이드 유사체를 포함할 수 있다.In one embodiment, the nucleic acid of the siRNA may comprise a nucleotide analogue in which the sugar or backbone of one or more nucleotides is modified.

일 구현예에서, 상기 당의 변형은 리보스의 2' 탄소 위치에서 수산화기(-OH)가 메틸기(-CH₃)(OMe), 메톡시기(-OCH₃), 아민기(-NH₂), 불소(-F)(fluoro), O-2-메톡시에틸기(methoxyethyl, MOE), O-프로필기, O-2-메틸티오에틸기, O-3-아미노프로필기, O-3-디메틸아미노프로필기, O-N-메틸아세트아미도기 또는 O-디메틸아미도옥시에틸기로 치환된 것일 수 있으며, 리보스의 2'-수산화기가 2'-MOE(2'-O-methoxyethyl), 2'-OMe(2'-O-Methyl) 또는 2'-F(fluoro)로 치환된 것이 더욱 바람직하다.In one embodiment, the modification of the sugar may be such that the hydroxyl group (-OH) at the 2' carbon position of ribose is substituted with a methyl group (-CH ₃ ) (OMe), a methoxy group (-OCH ₃ ), an amine group (-NH ₂ ), fluorine (-F) (fluoro), O-2-methoxyethyl group (MOE), O-propyl group, O-2-methylthioethyl group, O-3-aminopropyl group, O-3-dimethylaminopropyl group, ON-methylacetamido group or O-dimethylamidooxyethyl group, and it is more preferable that the 2'-hydroxyl group of ribose is substituted with 2'-MOE (2'- O -methoxyethyl), 2'-OMe (2'- O -Methyl) or 2'-F (fluoro).

일 구현예에서, 상기 백본의 변형은 뉴클레오타이드의 포스페이트 백본이 포스포로티오에이트(Phosphorothioate), 포스포로디티오에이트(phosphorodithioate), 메틸포스포네이트(methyl phosphonate), 알킬포스포네이트(alkylphosphonate), 포스포로아미데이트(phosphoroamidate) 또는 보라노포스페이트(boranophosphate)로 변형된 것일 수 있으며, 포스포로티오에이트로 변형된 것이 더욱 바람직하다.In one embodiment, the modification of the backbone may be such that the phosphate backbone of the nucleotide is modified with phosphorothioate, phosphorodithioate, methyl phosphonate, alkylphosphonate, phosphoroamidate or boranophosphate, more preferably with phosphorothioate.

본 발명에서 사용되는 용어, "역상보 서열(reverse complementary sequence)" 또는 "역상보화 서열"은 5'에서 3'로 표시된 유전자 서열의 반대방향의 서열을 5'에서 3'으로 표시한 서열을 의미한다. 보다 구체적으로, 일반적으로 유전자 서열의 표시 방법은 프로모터의 위치로부터 5'에서 3' 방향으로 유전자의 이중가닥 중 한 가닥의 서열만을 표시하는데, 유전자의 전사시, 상기 5'에서 3' 방향으로 표시된 유전자의 상보적인 서열, 즉 3' 에서 5' 서열을 주형으로 이용하고, 결과적으로 프로모터 위치 방향으로부터 5'에서 3' 서열이 mRNA 전사체의 서열과 일치하는 서열 (mRNA 서열의 경우 한가닥임)이 된다. 이때, 상기 mRNA와 동일한 서열을 센스 서열이라고 하고, 상기 센스 서열과 상보적인 서열을 안티센스 서열이라고 한다. 이 때, 주형으로 이용되는 가닥의 서열을 5'에서 3' 방향으로 표시한 서열을 역상보 서열 또는 역상보화 서열이라고 한다. 즉, 유전자의 서열 표시가 5'-ATGCATGC-3'이라고 표시가 되었을때, 이의 역상보 서열은 5'-GCATGCAT-3'이 된다. 또한, 유전자 서열이 5'-ATGCATGC-3'이라면, mRNA로 전사되는 서열은 5'-GCAUGCAU-3'이 되며, 전사된 mRNA의 안티센스 서열은 5'-AUGCAUGC-3'이 된다.The term "reverse complementary sequence" or "reverse complementary sequence" used in the present invention means a sequence in the opposite direction of a gene sequence indicated from 5' to 3', indicated from 5' to 3'. More specifically, a general method of indicating a gene sequence is to indicate only one strand of a double strand of a gene in the 5' to 3' direction from the promoter position, and when the gene is transcribed, the complementary sequence of the gene indicated in the 5' to 3' direction, that is, the 3' to 5' sequence, is used as a template, and as a result, the 5' to 3' sequence from the promoter position becomes a sequence that matches the sequence of an mRNA transcript (in the case of an mRNA sequence, one strand). At this time, the sequence identical to the mRNA is called the sense sequence, and the sequence complementary to the sense sequence is called the antisense sequence. At this time, the sequence of the strand used as a template indicated in the 5' to 3' direction is called the reverse complementary sequence or reverse complementary sequence. That is, when the gene sequence is indicated as 5'-ATGCATGC-3', its reverse complement sequence is 5'-GCATGCAT-3'. Also, if the gene sequence is 5'-ATGCATGC-3', the sequence transcribed into mRNA is 5'-GCAUGCAU-3', and the antisense sequence of the transcribed mRNA is 5'-AUGCAUGC-3'.

본 발명에서 사용되는 용어 "핵산 분자"는 단일가닥 또는 이중가닥 형태로 존재하는 디옥시리보뉴클레오타이드 또는 리보뉴클레오타이드이며, 다르게 특별하게 언급되어 있지 않은 한 자연의 핵산 유사체를 포함한다 (Scheit, Nucleotide Analogs , John Wiley, New York(1980); Uhlman 및 Peyman, Chemical Reviews , 90:543-584(1990)).The term "nucleic acid molecule" as used in the present invention refers to deoxyribonucleotides or ribonucleotides existing in single-stranded or double-stranded form, and includes natural nucleic acid analogs unless specifically stated otherwise (Scheit, Nucleotide Analogs, John Wiley, New York (1980); Uhlman and Peyman, Chemical Reviews , 90:543-584 (1990)).

본 발명에서 사용되는 용어, "발현 억제"란 표적 유전자의 (mRNA로의) 발현 또는 (단백질로의) 번역 저하를 야기하는 것을 의미하며, 바람직하게는 이에 의해 표적 유전자 발현이 탐지 불가능해지거나 무의미한 수준으로 존재하게 되는 것을 의미한다.The term "expression suppression" as used in the present invention means causing a decrease in the expression (into mRNA) or translation (into protein) of a target gene, and preferably means that the target gene expression becomes undetectable or exists at an insignificant level.

본 발명에서 사용되는 용어, "siRNA(small interfering RNA)"란 특정 mRNA의 절단(cleavage)을 통하여 RNAi(RNA interference) 현상을 유도할 수 있는 짧은 이중사슬 RNA를 의미한다. 일반적으로 siRNA는 표적 유전자의 mRNA와 상동인 서열을 가지는 센스 RNA 가닥과 이와 상보적인 서열을 가지는 안티센스 RNA 가닥으로 구성되나, 본 발명의 이중 타겟(타겟) siRNA는 센스 RNA 가닥이 제 1 유전자에 대한 안티센스 가닥이고, 안티센스 RNA 가닥이 제 2 유전자에 대한 안티센스 가닥이므로, 이중 가닥의 siRNA가 각각 동시에 제 1 유전자 및 제 2 유전자의 발현을 억제할 수 있기 때문에 효율적인 유전자 넉다운(knock-down) 방법으로서 또는, 유전자치료(gene therapy)의 방법으로 제공된다.The term "siRNA (small interfering RNA)" used in the present invention refers to a short double-stranded RNA that can induce RNAi (RNA interference) phenomenon through cleavage of a specific mRNA. In general, siRNA is composed of a sense RNA strand having a sequence homologous to the mRNA of a target gene and an antisense RNA strand having a sequence complementary thereto. However, the dual target siRNA of the present invention has a sense RNA strand that is an antisense strand for a first gene and an antisense RNA strand that is an antisense strand for a second gene, so that the double-stranded siRNA can simultaneously suppress the expression of the first gene and the second gene, and is therefore provided as an efficient gene knock-down method or as a gene therapy method.

본 발명에서 사용되는 용어, "shRNA(short hairpin RNA)"란, 단일가닥 RNA에서 부분적으로 회문상의 염기서열을 포함함으로써, 3´영역에 이중가닥 구조를 가지고 헤어핀과 같은 구조를 형성하고, 세포내에서 발현된 후에 세포내에 존재하는 RNase의 일종인 dicer에 의하여 절단되어 siRNA로 변환될 수 있는 RNA를 의미하는데, 상기 이중가닥 구조의 길이는 특별히 한정되지는 않으나, 바람직하게는 10 뉴클레오티드 이상이고, 보다 바람직하게는 20 뉴클레오티드 이상이다. 본 발명에 있어서, 상기 shRNA는 발현 카세트에 포함될 수 있으며, 상기 shRNA는 각 유전자에 대한 siRNA 안티센스 가닥 및 센스 가닥으로 이루어진 세트 서열에서 U를 T로 변환한 뒤, 센스 가닥의 3'에 TTGGATCCAA (TTGGATCCAA 루프) 또는 TTCAAGAGAG (TTCAAGAGAG 루프), 안티센스 가닥 및 TT를 순차적으로 연결하여 shRNA를 코딩하는 발현 카세트를 제작하고 이를 세포 내에서 발현시킴으로써 생산할 수 있다. The term "shRNA (short hairpin RNA)" used in the present invention refers to RNA which partially includes a palindromic base sequence in a single-stranded RNA, thereby forming a double-stranded structure in the 3' region and a hairpin-like structure, and which can be cleaved by dicer, a type of RNase present in the cell after being expressed in the cell and converted into siRNA. The length of the double-stranded structure is not particularly limited, but is preferably 10 nucleotides or more, and more preferably 20 nucleotides or more. In the present invention, the shRNA can be included in an expression cassette, and the shRNA can be produced by constructing an expression cassette encoding shRNA by sequentially linking TTGGATCCAA (TTGGATCCAA loop) or TTCAAGAGAG (TTCAAGAGAG loop) to the 3' of the sense strand, the antisense strand, and TT in this manner and expressing the same in a cell.

일 구현예에서, 제 1 유전자를 표적으로 하는 염기 서열은 제 2 유전자를 표적으로 하는 염기 서열의 역상보 서열과 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98% 또는 99% 이상의 상보성을 갖는 염기서열을 포함할 수 있고, 제 2 유전자를 표적으로 하는 염기 서열은 제 1 유전자를 표적으로 하는 염기 서열의 역상보 서열과 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98% 또는 99% 이상의 상보성을 갖는 염기서열을 포함할 수 있다.In one embodiment, the base sequence targeting the first gene can include a base sequence having at least 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% complementarity to the reverse complement sequence of the base sequence targeting the second gene, and the base sequence targeting the second gene can include a base sequence having at least 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% complementarity to the reverse complement sequence of the base sequence targeting the first gene.

제 1 유전자를 표적으로하는 염기 서열 또는 제 2 유전자를 표적으로 하는 염기 서열의 변이체가 본 발명의 범위 내에 포함된다. 본 발명의 발현 카세트는 이를 구성하는 핵산 분자의 작용성 등가물, 예를 들어, 핵산 분자의 일부 염기서열이 결실(deletion), 치환(substitution) 또는 삽입(insertion)에 의해 변형되었지만, 염기 서열 분자와 기능적으로 동일한 작용을 할 수 있는 변이체(variants)를 포함하는 개념이다. 핵산 분자에 대한 "서열 상동성의 %"는 두 개의 최적으로 배열된 서열과 비교 영역을 비교함으로써 확인되며, 비교 영역에서의 핵산 분자 서열의 일부는 두 서열의 최적 배열에 대한 참고 서열(추가 또는 삭제를 포함하지 않음)에 비해 추가 또는 삭제(즉, 갭)를 포함할 수 있다.Variants of the base sequence targeting the first gene or the base sequence targeting the second gene are included within the scope of the present invention. The expression cassette of the present invention is a concept including functional equivalents of the nucleic acid molecule constituting it, for example, variants in which a part of the base sequence of the nucleic acid molecule is modified by deletion, substitution or insertion, but which can perform the same function functionally as the base sequence molecule. The "% sequence homology" for a nucleic acid molecule is determined by comparing two optimally aligned sequences with a comparison region, and a part of the nucleic acid molecule sequence in the comparison region may include additions or deletions (i.e., gaps) compared to the reference sequence for the optimal alignment of the two sequences (which does not include additions or deletions).

일 측면에서, 본 발명은 본 발명의 핵산 분자를 발현하는 재조합 발현 벡터에 관한 것이다.In one aspect, the present invention relates to a recombinant expression vector expressing a nucleic acid molecule of the present invention.

본 발명의 재조합 벡터는 당해 분야에 공지된 재조합 DNA 방법에 의해 제조될 수 있다.The recombinant vector of the present invention can be produced by a recombinant DNA method known in the art.

본 발명에서 이중 타겟 핵산 분자를 전달하기에 유용한 비바이러스 벡터로는 통상적으로 유전자 요법에 사용되는 모든 벡터를 포함하며, 예를 들어 진핵세포에서 발현 가능한 다양한 플라스미드 및 리포좀 등이 있다.Nonviral vectors useful for delivering the dual target nucleic acid molecules in the present invention include all vectors commonly used in gene therapy, for example, various plasmids and liposomes capable of expression in eukaryotic cells.

본 발명에서 이중 타겟 핵산 분자인 siRNA가 전달된 세포에서 적절히 전사되게 하기 위해서는 이를 포함하는 shRNA를 암호화하는 염기서열이 적어도 프로모터에 작동가능하게 연결되는 것이 바람직하다. 상기 프로모터는 진핵세포에서 기능할 수 있는 프로모터라면 어떤 것이든지 무방하다. 이중 타겟 핵산 분자인 siRNA 또는 shRNA의 효율적인 전사를 위하여 필요에 따라 리더 서열, 폴리아데닐화 서열, 프로모터, 인핸서, 업스트림 활성화 서열, 신호펩타이드 서열 및 전사 종결인자를 비롯한 조절서열을 추가로 포함할 수도 있다.In order for the dual target nucleic acid molecule siRNA of the present invention to be properly transcribed in the delivered cell, it is preferable that at least the base sequence encoding the shRNA including it be operably linked to a promoter. Any promoter that can function in a eukaryotic cell may be used as the promoter. In order to efficiently transcribe the dual target nucleic acid molecule siRNA or shRNA, a regulatory sequence including a leader sequence, a polyadenylation sequence, a promoter, an enhancer, an upstream activating sequence, a signal peptide sequence, and a transcription terminator may be additionally included as needed.

본 발명에서 이중 타겟 핵산 분자를 전달하기에 유용한 바이러스 또는 바이러스 벡터로는 바쿨로비리디애(baculoviridiae), 파르보비리디애(parvoviridiae), 피코르노비리디애(picornoviridiae), 헤레페스비리디애(herepesviridiae), 폭스비리디애(poxviridiae), 아데노비리디애(adenoviridiae) 등이 있지만, 이에 제한되는 것은 아니다.Viruses or viral vectors useful for delivering the dual target nucleic acid molecule in the present invention include, but are not limited to, baculoviridiae, parvoviridiae, picornoviridiae, herepesviridiae, poxviridiae, adenoviridiae, and the like.

일 측면에서, 본 발명은 기형성된 이중 타겟 핵산 분자 데이터베이스에서, 이중 타겟 핵산 분자의 제1 핵산 분자 또는 제 2 핵산 분자의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해, 제 1 핵산 분자 또는 제 2 핵산분자의 5' 또는 3' 말단으로부터 0.25 내지 0.40에 해당하는 위치 또는 0.60 내지 0.75에 해당하는 위치에서 제 1 핵산 분자와 제 2 핵산 분자의 염기쌍이 미스매치된 것을 선별하는 단계를 포함하는 이중 타겟 핵산 분자의 설계 방법에 관한 것이다.In one aspect, the present invention relates to a method for designing a dual target nucleic acid molecule, comprising the step of selecting, from a database of pre-formed dual target nucleic acid molecules, a base pair mismatch between a first nucleic acid molecule and a second nucleic acid molecule at a position corresponding to 0.25 to 0.40, or a position corresponding to 0.60 to 0.75 from a 5' or 3' end of the first nucleic acid molecule or the second nucleic acid molecule, with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule of the dual target nucleic acid molecule.

일 구현예에서, 제 1 핵산 분자 또는 제 2 핵산 분자의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해 제 1 핵산 분자 또는 제 2 핵산 분자의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.30에 해당하는 위치 또는 0.70 내지 0.75에 해당하는 위치에서 제 1 핵산 분자와 제 2 핵산 분자의 염기쌍이 미스매치(mismatch)된 것일 수 있다.In one embodiment, a base pair mismatch between the first nucleic acid molecule and the second nucleic acid molecule may occur at a position that is 0.25 to 0.30, or a position that is 0.70 to 0.75 from the 5' end or the 3' end of the first nucleic acid molecule or the second nucleic acid molecule with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule.

일 구현예에서, 상기 이중 타겟 핵산 분자의 길이가 20mer인 경우, 1) 제 1 핵산 분자의 5' 말단으로부터 7번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 14번째 위치의 염기가 미스매치되고; 2) 제 1 핵산 분자의 5' 말단으로부터 9번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 12번째 위치의 염기가 미스매치되며; 및 3) 제 1 핵산 분자의 5' 말단으로부터 14번째 위치의 염기 및 제 2 핵산 분자의 5' 말단으로부터 7번째 위치의 염기가 미스매치될 수 있다.In one embodiment, when the length of the dual target nucleic acid molecule is 20mer, 1) the base at the 7th position from the 5' end of the first nucleic acid molecule and the base at the 14th position from the 5' end of the second nucleic acid molecule may be mismatched; 2) the base at the 9th position from the 5' end of the first nucleic acid molecule and the base at the 12th position from the 5' end of the second nucleic acid molecule may be mismatched; and 3) the base at the 14th position from the 5' end of the first nucleic acid molecule and the base at the 7th position from the 5' end of the second nucleic acid molecule may be mismatched.

일 구현예에서, 상기 이중 타겟 핵산 분자 데이터베이스의 생성은, 제 1 유전자의 mRNA의 5' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 5' 말단으로부터 3' 말단 방향으로 1nt(nucleotide)씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수 번의 세그멘트화하는 것일 수 있다.In one embodiment, the generation of the dual target nucleic acid molecule database can generate sequence information through a segmentation step of segmenting sequence information from the 5' end of the mRNA of the first gene into segments of 15 to 30 mer in length, and the segmentation step can be a multiple segmentation of 15 to 30 mer in length, starting from a point sequentially spaced apart by 1 nt (nucleotide) in the 3' end from the 5' end of the mRNA.

일 구현예에서, 상기 이중 타겟 핵산 분자 데이터베이스의 생성은, 제 2 유전자의 mRNA의 3' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 3' 말단으로부터 5' 말단 방향으로 1nt씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수번의 세그멘트화하는 것일 수 있다.In one embodiment, the generation of the dual target nucleic acid molecule database can generate sequence information through a segmentation step of segmenting sequence information from the 3' end of the mRNA of the second gene into segments of 15 to 30 mer in length, and the segmentation step can be a multiple number of segmentations of 15 to 30 mer in length, starting from points sequentially spaced apart by 1 nt in the 5' end direction from the 3' end of the mRNA.

일 구현예에서, 상기 세그멘트화 단계의 서열 정보의 생성은 17 내지 21mer의 길이로 생성되는 것일 수 있다.In one embodiment, the sequence information generated in the segmentation step may be generated in a length of 17 to 21 mer.

일 구현예에서, 상기 이중 타겟 핵산 분자 데이터베이스의 생성은, 참고문헌인 대한민국 특허출원 10-2019-0160720의 이중 타겟 핵산분자의 설계방법에 의하여 생성할 수 있다.In one embodiment, the generation of the dual target nucleic acid molecule database can be generated by the design method of a dual target nucleic acid molecule of the reference document, Republic of Korea Patent Application No. 10-2019-0160720.

또한, 일 구현예에서, (1) 제 1 유전자의 mRNA의 5' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 5' 말단으로부터 3' 말단 방향으로 1nt(nucleotide)씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수 번의 세그멘트화하는 것일 수 있고; In addition, in one embodiment, (1) the sequence information can be generated through a segmentation step of segmenting the sequence information from the 5' end of the mRNA of the first gene into segments of 15 to 30 mer in length, and the segmentation step can be a multiple segmentation of 15 to 30 mer in length, starting from a point sequentially spaced apart by 1 nt (nucleotide) in the 3' end direction from the 5' end of the mRNA;

(2) 제 2 유전자의 mRNA의 3' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 3' 말단으로부터 5' 말단 방향으로 1nt씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수 번의 세그멘트화하는 것일 수 있고; 및(2) The sequence information can be generated through a segmentation step that segments the sequence information from the 3' end of the mRNA of the second gene into 15 to 30 mer lengths to generate the sequence information, and the segmentation step can be segmenting multiple times into 15 to 30 mer lengths starting from a point sequentially spaced by 1 nt in the 5' end direction from the 3' end of the mRNA; and

(3) 상기 제 1 유전자의 mRNA로부터 생성된 서열 정보와 상기 제 2 유전자의 mRNA로부터 생성된 서열 정보를 얼라이먼트(alignment)하는 단계를 포함할 수 있다.(3) It may include a step of aligning sequence information generated from mRNA of the first gene and sequence information generated from mRNA of the second gene.

일 구현예에서, 얼라이먼트하는 단계 이후에 매칭된 서열을 점수화하는 단계를 추가로 포함할 수 있다.In one implementation, the step of scoring the matched sequences may be additionally included after the step of aligning.

일 구현예에서, 상기 점수화하는 단계는 세그멘트화된 서열이 1) 시작 코돈(start condon)으로부터 75nt 거리 내에 위치하는 않는 경우, 2) G 염기 및 C 염기의 함량이 전체 염기 수에 대해서 36% 내지 52%인 경우, 3) GC 반복서열이 3 미만인 경우, 4) AT 반복 서열이 4 미만인 경우, 5) 세그멘트화된 서열이 센스 서열인 경우, G 염기 또는 C 염기가 첫 번째 위치에 존재하는 경우, 6) 세그멘트화된 서열이 센스 서열인 경우, A 염기가 3번째 위치에 존재하는 경우, 7) 세그멘트화된 서열이 센스 서열인 경우, T 또는 U 염기가 10번째 위치에 존재하는 경우, 8) 세그멘트화된 서열이 센스 서열인 경우, G 염기가 13번째 위치에 존재하지 않는 경우, 9) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 A 염기가 존재하는 경우, 10) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 G 염기 또는 C 염기가 존재하지 않는 경우, 11) 세그멘트화된 서열이 안티센스 서열인 경우, 첫 번째 위치에 A 염기, T 염기 또는 U 염기가 존재하지 않는 경우, 12) 세그멘트화된 서열이 안티센스 서열인 경우, 서열의 첫 번째 위치에 G 염기가 존재하는 경우, 13) 세그멘트화된 서열이 안티센스 서열인 경우, 6번째 위치에 A가 존재하는 경우, 14) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 G 염기 또는 C 염기가 아닌 경우, 15) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 U 염기 또는 T 염기인 경우, 및 16) 세그멘트화된 서열이 코딩서열(Coding sequence, CDS)에 위치하는 경우에 1점의 가중치를 주는 것일 수 있다.In one embodiment, the scoring step is performed when the segmented sequence 1) is not located within 75 nt from a start codon, 2) the content of G base and C base is 36% to 52% with respect to the total number of bases, 3) the GC repeat sequence is less than 3, 4) the AT repeat sequence is less than 4, 5) the segmented sequence is a sense sequence, and a G base or a C base is present at the first position, 6) the segmented sequence is a sense sequence, and an A base is present at the third position, 7) the segmented sequence is a sense sequence, and a T or U base is present at the 10th position, 8) the segmented sequence is a sense sequence, and a G base is not present at the 13th position, 9) the segmented sequence is a sense sequence, and an A base is present at the 19th position, 10) the segmented sequence is a sense sequence. A weighting of 1 point may be given to the following cases: 1) if there is no G base or C base at the 19th position, 11) if the segmented sequence is an antisense sequence, and there is no A base, T base, or U base at the first position, 12) if the segmented sequence is an antisense sequence, and there is a G base at the first position of the sequence, 13) if the segmented sequence is an antisense sequence, and there is A at the 6th position, 14) if the segmented sequence is an antisense sequence, and the 19th sequence is not a G base or C base, 15) if the segmented sequence is an antisense sequence, and the 19th sequence is a U base or T base, and 16) if the segmented sequence is located in a coding sequence (CDS).

일 구현예에서, 상기 점수화하는 단계는 세그멘트화된 서열과 상기 세그멘트화된 서열과 매칭된 서열의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해 세그멘트화된 서열 또는 상기 세그멘트화된 서열과 매칭된 서열의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.40에 해당하는 위치 또는 0.60 내지 0.75에 해당하는 위치에 미스매치(mismatch)가 존재하는 경우 가중치를 주는 것일 수 있다.In one embodiment, the scoring step may be to weight a mismatch at a position corresponding to 0.25 to 0.40, or a position corresponding to 0.60 to 0.75 from the 5' end or the 3' end of the segmented sequence or the sequence matched with the segmented sequence, with respect to a total nucleotide (nt) length of 1.00 of the segmented sequence and the sequence matched with the segmented sequence.

바람직하게는 상기 세그멘트화된 서열과 매칭된 서열의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.30에 해당하는 위치 또는 0.70 내지 0.75에 해당하는 위치에 미스매치(mismatch)가 존재하는 경우 가중치를 주는 것일 수 있다.Preferably, a weight may be given when a mismatch exists at a position corresponding to 0.25 to 0.30 or a position corresponding to 0.70 to 0.75 from the 5' end or the 3' end of the sequence matching the segmented sequence.

상기 세그멘트화된 서열과 매칭된 서열의 특정 위치에 미스매치가 존재하는 경우, 세그멘트화된 서열 및 "매칭된 서열" 각각이 타겟하려는 목적 유전자를 제외한 다른 유전자에 대한 오프 타겟 효과를 제거할 수 있다. 본 발명에서 사용되는 용어 "오프 타겟 효과(off-target effect)"란, 목적하려는 타겟 유전자 아닌 다른 유전자를 타겟하는 효과를 의미한다.If a mismatch exists at a specific position in the sequence matched with the segmented sequence, the segmented sequence and the "matched sequence" can eliminate the off-target effect on genes other than the target gene to be targeted, respectively. The term "off-target effect" used in the present invention means the effect of targeting a gene other than the target gene to be targeted.

일 구현예에서, 상기 점수화 단계 후, 상기 점수화의 점수가 높은 순위부터 낮은 순위로 정렬하는 단계를 포함할 수 있다. 또한, 상기 점수화 단계 후, 상기 세그멘트화된 서열과 상기 세그멘트화된 서열과 얼라이먼트되어 매칭된 서열 각각을 기 형성된 유전자 전사체에 대한 데이터 베이스의 서열과 얼라이먼트를 시킨 후, 상기 세그멘트화된 서열과 상기 세그멘트화된 서열과 얼라이먼트되어 매칭된 서열 각각의 서열이 자신이 유래된 유전자의 전사체를 제외하고 매칭이 되는 유전자 전사체가 없는 경우, 이를 이중 타겟 핵산분자로 선정하는 단계를 추가로 포함할 수 있다. In one embodiment, after the scoring step, the method may further include a step of arranging the scores of the scoring from high to low. In addition, after the scoring step, the method may further include a step of aligning each of the segmented sequences and the sequences matched by the segmented sequences with a sequence of a database for previously formed gene transcripts, and then selecting each of the segmented sequences and the sequences matched by the segmented sequences as dual target nucleic acid molecules if there is no gene transcript matching each other except for the transcript of the gene from which it is derived.

일 구현예에서, 얼라이먼트는 striped smith-waterman algorithm, needleman-wunsch algorithm, levenshtein distance algorithm, heuristic algorithm 또는 hamming distance algorithm을 이용할 수 있고, 바람직하게는 Striped smith-waterman 알고리즘을 이용할 수 있으나, 이에 제한되는 것은 아니다. 또한, 상기 단계에서는 얼라이먼트화되어 매칭된 서열들이 유래된 유전자 (제 2 유전자라고 함)가 제 1 유전자와 동일한 질환에 관련되어 있는 경우, 상기 매칭된 서열들만을 선택하여, 이 후 단계를 진행할 수 있다.In one embodiment, the alignment may use a striped smith-waterman algorithm, a needleman-wunsch algorithm, a levenshtein distance algorithm, a heuristic algorithm, or a hamming distance algorithm, and preferably, a striped smith-waterman algorithm may be used, but is not limited thereto. In addition, in the step, if the gene from which the aligned and matched sequences are derived (referred to as a second gene) is related to the same disease as the first gene, only the matched sequences may be selected, and the subsequent step may be performed.

일 구현예에서, 상기 얼라이먼트하는 단계는, 세그멘트화된 서열의 상보서열(complementary sequence)을 이용하는 것일 수 있다.In one embodiment, the aligning step may utilize a complementary sequence of the segmented sequence.

보다 구체적으로, 본 발명의 일 구현예에 있어서, (1) 제 1 유전자의 mRNA의 서열을 5'에서 3' 방향으로 세그멘트화한 서열과 상보적인 서열과 (2) 제 2 유전자의 mRNA의 서열을 3'에서 5'방향으로 세그멘트화한 서열을 서로 얼라이먼트할 수 있다. 예컨대, (1)의 서열이 5'-AUGAU-3' 이고, (1)의 서열과 상보적인 서열은 3'-UACUA-5'이 될 수 있고, (2)의 서열이 3'-UACUG-5`과 매칭될 수 있고 (5'에 A와 G과 미스매칭됨), 상기 서열이 소정의 기준을 넘는 점수를 확보한 경우, 이중 타겟 핵산분자의 후보 서열이 되어, siRNA 또는 shRNA 설계시, 제 1 유전자의 mRNA를 타겟하는 서열은 5'-AUCAU-3'로 변형할 수 있으며, 제 2 유전자의 mRNA를 타겟하는 서열은 3'-CAGUA-5', 즉 5'-AUGAC-3'으로 변형될 수 있다. 즉, 최종 도출되는 서열은 제 1 유전자의 mRNA를 타겟하는 서열로서, 5'-AUCAU-3' (제 1 서열로 칭함), 제 2 유전자의 mRNA를 타겟하는 서열로서, 5'-AUGAC-3' (제 2 서열로 칭함)이 될 수 있다. 즉, 실제로 양 서열은 서로 부분적으로 상보적 결합을 하여 제 1 서열의 5' 말단의 A와 제 2 서열의 3' 말단의 C가 미스매칭을 이루어, 이중 타겟 핵산분자로 작용할 수 있는 포텐셜을 가질 수 있다.More specifically, in one embodiment of the present invention, (1) a sequence complementary to a sequence segmented in the 5' to 3' direction of the mRNA of the first gene and (2) a sequence segmented in the 3' to 5' direction of the mRNA of the second gene can be aligned with each other. For example, if the sequence of (1) is 5'-AUGAU-3', and the sequence complementary to the sequence of (1) can be 3'-UACUA-5', and the sequence of (2) can match 3'-UACUG-5' (mismatching A and G at 5'), and the sequence obtains a score exceeding a predetermined standard, it becomes a candidate sequence of a dual target nucleic acid molecule, and when designing siRNA or shRNA, the sequence targeting the mRNA of the first gene can be modified to 5'-AUCAU-3', and the sequence targeting the mRNA of the second gene can be modified to 3'-CAGUA-5', i.e., 5'-AUGAC-3'. That is, the final derived sequence can be 5'-AUCAU-3' (referred to as the first sequence) as a sequence targeting the mRNA of the first gene, and 5'-AUGAC-3' (referred to as the second sequence) as a sequence targeting the mRNA of the second gene. That is, in fact, the two sequences can have the potential to act as dual target nucleic acid molecules by partially complementary binding to each other, such that the A at the 5' end of the first sequence and the C at the 3' end of the second sequence are mismatched.

또한, 본 발명의 다른 구현예에서, (1) 제 1 유전자의 mRNA의 서열을 5'에서 3' 방향으로 세그멘트화한 서열과 (2) 제 2 유전자의 mRNA의 서열을 3'에서 5' 방향으로 세그멘트화한 서열의 상보적인 서열을 얼라이먼트할 수 있다. 예컨대, (1)의 서열이 5'-AUGAU-3' 이고, (2)의 서열이 3'-UACUG-5'라고 할 때, 이의 상보적인서열은 5'-AUGAC-3' 이고, 상기 서열은 서로 매칭될 수 있고 (3'에 U와 C와 미스매칭됨), 상기 서열이 소정의 기준을 넘는 점수를 확보한 경우, 상기 서열은 이중 타겟 핵산분자의 후보서열이 되어, siRNA 또는 shRNA 설계시, 제 1 유전자의 mRNA를 타겟하는 서열은 5'-AUCAU-3'로 변형할 수 있으며, 제 2 유전자의 mRNA를 타겟하는 서열은 5'-AUGAC-3'가 될 수 있다. 즉, 최종 도출되는 서열은 제 1 유전자의 mRNA를 타겟하는 서열로서, 5'-AUCAU-3' (제 1 서열로 칭함), 제 2 유전자의 mRNA를 타겟하는 서열로서, 5'-AUGAC-3' (제 2 서열로 칭함)이 될 수 있다. 즉, 실제로 양 서열은 서로 부분적으로 상보적 결합을 하여 제 1 서열의 5' 말단의 A와 제 2 서열의 3' 말단의 C가 미스매칭을 이루어, 이중 타겟 핵산분자로 작용할 수 있는 포텐셜을 가질 수 있다.In addition, in another embodiment of the present invention, the complementary sequences of (1) a sequence segmented in the 5' to 3' direction of the mRNA of the first gene and (2) a sequence segmented in the 3' to 5' direction of the mRNA of the second gene can be aligned. For example, when the sequence of (1) is 5'-AUGAU-3' and the sequence of (2) is 3'-UACUG-5', its complementary sequence is 5'-AUGAC-3', and the above sequences can match each other (mismatching with U and C at 3'), and when the above sequences secure a score exceeding a predetermined standard, the above sequences become candidate sequences of a dual target nucleic acid molecule, so that when designing siRNA or shRNA, the sequence targeting the mRNA of the first gene can be modified to 5'-AUCAU-3', and the sequence targeting the mRNA of the second gene can be 5'-AUGAC-3'. That is, the final derived sequence can be 5'-AUCAU-3' (referred to as the first sequence) as a sequence targeting the mRNA of the first gene, and 5'-AUGAC-3' (referred to as the second sequence) as a sequence targeting the mRNA of the second gene. That is, in fact, the two sequences can have the potential to act as dual target nucleic acid molecules by partially complementary binding to each other, such that the A at the 5' end of the first sequence and the C at the 3' end of the second sequence are mismatched.

본 발명에서 사용되는 용어, "센스 서열", "세그먼트화된 서열" 또는 siRNA의 "guide RNA"는, 타겟하려는 제 1 유전자에서 발현된 mRNA 5'에서 3' 서열에 상보적인 서열 (3' -> 5')을 의미하고, "안티센스 서열" 또는 siRNA의 "passenger RNA"이란, 타겟하려는 제 2 유전자에서 발현된 mRNA 3'에서 5' 서열에 상보적인 서열 (5' -> 3')을 의미할 수 있다.As used herein, the terms “sense sequence”, “segmented sequence” or “guide RNA” of siRNA may mean a sequence complementary to a 5’ to 3’ sequence of mRNA expressed from a first gene to be targeted (3’ -> 5’), and “antisense sequence” or “passenger RNA” of siRNA may mean a sequence complementary to a 3’ to 5’ sequence of mRNA expressed from a second gene to be targeted (5’ -> 3’).

본 발명에서 사용되는 용어, "코딩서열(coding sequence, CDS)"이란, 단백질로 번역되는 서열을 의미할 수 있다.The term "coding sequence (CDS)" used in the present invention may mean a sequence that is translated into a protein.

본 발명에서 사용되는 용어, "시작코돈"이란 서열들이 유래한 유전자의 mRNA가 전사되는 시작부위를 의미한다. 예를들어, 서열들이 시작코돈으로부터 75bp에 위치하지 않을 것이라는 요건은 세그멘트화된 서열이 유래된 제 1 유전자의 개시코돈으로부터 세그멘트화된 서열이 75bp 이내에 위치한 서열로부터 유래되지 않는 경우를 의미한다.The term "start codon" as used in the present invention means the starting site at which the mRNA of the gene from which the sequences are derived is transcribed. For example, the requirement that the sequences are not located 75 bp from the start codon means that the segmented sequence is not derived from a sequence located within 75 bp of the start codon of the first gene from which the segmented sequence is derived.

본 발명에서 얼라이먼트 시, 이용하는 서열은 세그멘트화된 서열의 5'에서 3' 서열 방향 서열 (제 1 유전자의 mRNA에 대한 안티센스 서열)을 이용하고, "매칭된 서열"의 경우도 5'에서 3' 서열 (제 2 유전자의 mRNA에 대한 안티센스)을 이용할 수 있다. 보다 구체적으로 설명하면, 세그멘트화된 서열이 5'-ATGCTAC-3' 이고, 이에 매칭된 서열은 5'-GTAGCAT-3'이 되어, 세그멘트화된 서열은 제 1 유전자의 mRNA를 타겟하고, 매칭된 서열은 제 2 유전자의 mRNA를 타겟한다. 이 때, 얼라이먼트 시, 세그멘트화된 서열 5'-ATGCTAC-3' 서열을 이용하고, 매칭된 서열은 5'-GTAGCAT-3' 서열을 이용하여, 각각 제 1 유전자 및 제 2유전자를 제외하고는 얼라이먼트되어 매칭된 서열이 없는 경우, 상기 서열을 이중 타겟 핵산분자로 선정할 수 있다.In the present invention, when performing alignment, the sequence used is the 5' to 3' sequence direction sequence of the segmented sequence (antisense sequence for mRNA of the first gene), and in the case of the "matched sequence", the 5' to 3' sequence (antisense for mRNA of the second gene) can also be used. More specifically, the segmented sequence is 5'-ATGCTAC-3', and the sequence matched thereto is 5'-GTAGCAT-3', so that the segmented sequence targets the mRNA of the first gene, and the matched sequence targets the mRNA of the second gene. At this time, when performing alignment, the segmented sequence 5'-ATGCTAC-3' sequence is used, and the matched sequence 5'-GTAGCAT-3' sequence is used, so that when there is no aligned and matched sequence except for the first gene and the second gene, the sequence can be selected as a dual target nucleic acid molecule.

본 발명에서 이중 타겟 핵산분자는 siRNA 또는 shRNA로 설계될 수 있으나, 이에 제한되지 않는다. 상기 siRNA로 설계되는 경우, 17 내지 24bp의 이중가닥 siRNA일 수 있고, 한 가닥은 "세그멘트화된 서열"로부터 유래된 것, 즉, 제 1 유전자로부터 유래된 것일 수 있고, 다른 한 가닥은 "매칭된 서열"로부터 유래된 것, 즉 제 2 유전자로부터 유래된 것일 수 있다. shRNA로 설계되는 경우, 제 1 유전자로부터 유래된 서열, 헤어핀 구조를 이룰 수 있는 구조체 및 제 2 유전자로부터 유래된 서열을 포함한 구조로 설계될 수 있다.In the present invention, the dual target nucleic acid molecule may be designed as siRNA or shRNA, but is not limited thereto. When designed as siRNA, it may be a double-stranded siRNA of 17 to 24 bp, and one strand may be derived from a "segmented sequence", that is, derived from a first gene, and the other strand may be derived from a "matched sequence", that is, derived from a second gene. When designed as shRNA, it may be designed as a structure including a sequence derived from a first gene, a structure capable of forming a hairpin structure, and a sequence derived from a second gene.

본 발명에서 제 1 유전자 또는 제 2 유전자를 타겟한다는 의미는 제 1 유전자의 전사체 또는 제 2 유전자의 전사체를 타겟(표적화)한다는 의미일 수 있다.In the present invention, targeting the first gene or the second gene may mean targeting the transcript of the first gene or the transcript of the second gene.

본 발명의 핵산분자는 서로 다른 유전자를 타겟하는 것을 목적으로 하고, 세그멘트화된 서열이 타겟하는 제 1 유전자와 "매칭된 서열"이 타겟하는 제 2 유전자는 동일한 질환, 예를들어, 암과 관련된 유전자, 즉 온코진일 수 있으나 이에 제한되는 것은 아니다.The nucleic acid molecules of the present invention are intended to target different genes, and the first gene targeted by the segmented sequence and the second gene targeted by the "matched sequence" may be genes associated with the same disease, for example, cancer, i.e., oncogenes, but are not limited thereto.

도 6은 일 실시예에 따른 이중 타겟 핵산분자를 설계하는 장치(200)의 블록도이다.FIG. 6 is a block diagram of a device (200) for designing a dual target nucleic acid molecule according to one embodiment.

도 6을 참조하면, 상기 장치(900)는 메모리(210), 입력부(220) 및 적어도 하나의 프로세서(230)를 포함할 수 있다. 상기 실시예들에서 제안한 핵산분자를 설계하는 방법에 따라, 메모리(210), 입력부(220) 및 적어도 하나의 프로세서(230)가 동작할 수 있다. 다만, 일 실시예에 따른 장치(200)의 구성 요소가 전술한 예에 한정되는 것은 아니다. 다른 실시예에 따라, 핵산분자를 설계하는 장치(200)는 전술한 구성 요소들 보다 더 많은 구성 요소를 포함하거나 더 적은 구성 요소를 포함할 수도 있다. Referring to FIG. 6, the device (900) may include a memory (210), an input unit (220), and at least one processor (230). According to the method of designing a nucleic acid molecule proposed in the above embodiments, the memory (210), the input unit (220), and at least one processor (230) may operate. However, the components of the device (200) according to one embodiment are not limited to the above-described example. According to another embodiment, the device (200) for designing a nucleic acid molecule may include more or fewer components than the above-described components.

일 실시예에 따른 메모리(210)는 유전자의 서열정보를 기초로 기 생성된 데이터 베이스를 저장할 수 있다. 예를 들어, 메모리(210)는 데이터 베이스의 일 예로 NCBI의 데이터를 저장할 수 있고, 이미 형성된 이중 특이적 핵산 분자의 정보를 저장할 수 있다. 또한, 다른 실시예에 따라, 메모리(210)는 NCBI를 통해 획득한 유전자정보들을 저장할 수 있다.The memory (210) according to one embodiment can store a database that has been previously created based on the sequence information of a gene. For example, the memory (210) can store data of NCBI as an example of a database, and can store information on already formed dual specific nucleic acid molecules. In addition, according to another embodiment, the memory (210) can store genetic information obtained through NCBI.

일 실시예에 따른 입력부(220)는 타겟 유전자명 또는 서열정보등을 입력할 수 있다. 다만, 이는 일 실시예일 뿐, 입력부(220)는 이중 타겟 핵산 서열 설계를 위해 요구되는 모든 사용자 입력을 수신할 수 있다. According to one embodiment, the input unit (220) can input target gene names or sequence information, etc. However, this is only one embodiment, and the input unit (220) can receive all user inputs required for dual target nucleic acid sequence design.

일 실시예에 따른 적어도 하나의 프로세서(230)는 기 생성된 데이터 베이스를 이용하여, 제 1 유전자의 mRNA의 5' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 5' 말단으로부터 3' 말단 방향으로 1nt(nucleotide)씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수 번의 세그멘트화하는 것일 수 있고; At least one processor (230) according to one embodiment may generate sequence information through a segmentation step of segmenting sequence information from the 5' end of mRNA of the first gene into lengths of 15 to 30 mer using a pre-generated database, and the segmentation step may be a multiple-time segmentation of 15 to 30 mer lengths, starting from a point sequentially spaced apart by 1 nt (nucleotide) in the 3' end direction from the 5' end of the mRNA;

(2) 제 2 유전자의 mRNA의 3' 말단으로부터 서열 정보를 15 내지 30mer의 길이로 세그멘트화하여 서열 정보를 생성하는 세그멘트화 단계를 통해 서열 정보를 생성할 수 있으며, 상기 세그멘트화 단계는, 상기 mRNA의 3' 말단으로부터 5' 말단 방향으로 1nt씩 순차적으로 이격된 지점을 기점으로 하여 15 내지 30mer 길이만큼 복수 번의 세그멘트화하는 것일 수 있고, 상기 세그멘트화된 서열을 얼라이먼트하여 매칭된 서열들을 점수화시켜, 점수가 높은 서열들을 특정 기준에 따라 선택하여, 이를 이중 타겟 핵산분자의 후보물질로 결정할 수 있다.(2) The sequence information can be generated through a segmentation step of segmenting the sequence information from the 3' end of the mRNA of the second gene into segments of 15 to 30 mer in length, and the segmentation step can be segmented multiple times into segments of 15 to 30 mer in length, starting from points sequentially spaced by 1 nt each in the 5' end direction from the 3' end of the mRNA, and the segmented sequences can be aligned to score the matched sequences, and sequences with high scores can be selected according to specific criteria to determine them as candidates for the dual target nucleic acid molecule.

상기 점수화의 경우, 기존 대한민국출원 10-2019-0610720에서 개시한 점수화방법을 이용할 수 있다.For the above scoring, the scoring method disclosed in the existing Republic of Korea application 10-2019-0610720 can be used.

또한, 결정된 후보물질의 전체 뉴클레오타이드(nucleotide, nt) 길이 1.00에 대해 5' 말단 또는 3' 말단으로부터 0.25 내지 0.40에 해당하는 위치 또는 0.60 내지 0.75에 해당하는 위치에 미스매치(mismatch)가 존재하는 경우, 최적의 이중 타겟 핵산분자를 도출할 수 있다.In addition, when a mismatch exists at a position corresponding to 0.25 to 0.40 or a position corresponding to 0.60 to 0.75 from the 5' end or the 3' end with respect to the total nucleotide (nt) length of 1.00 of the determined candidate substance, an optimal dual target nucleic acid molecule can be derived.

상기 세그멘트화된 서열과 매칭된 서열의 5' 말단 또는 3' 말단으로부터 0.25 내지 0.30에 해당하는 위치 또는 0.70 내지 0.75에 해당하는 위치에 미스매치(mismatch)가 존재하는 경우 가중치를 부여할 수 있다.A weight may be assigned when a mismatch exists at a position corresponding to 0.25 to 0.30 or a position corresponding to 0.70 to 0.75 from the 5' end or the 3' end of the sequence matching the segmented sequence.

본 발명에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The device according to the present invention may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a user interface device such as a touch panel, a key, a button, etc. The methods implemented as software modules or algorithms may be stored on a computer-readable recording medium as computer-readable codes or program commands executable on the processor. Here, the computer-readable recording medium includes a magnetic storage medium (e.g., a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, etc.) and an optical reading medium (e.g., a CD-ROM, a Digital Versatile Disc (DVD)). The computer-readable recording medium may be distributed to computer systems connected to a network, so that the computer-readable code may be stored and executed in a distributed manner. The medium is readable by a computer, stored in a memory, and executed by a processor.

본 발명에서 인용하는 공개 문헌, 특허 출원, 특허 등을 포함하는 모든 문헌들은 각 인용 문헌이 개별적으로 및 구체적으로 병합하여 나타내는 것 또는 본 발명에서 전체적으로 병합하여 나타낸 것과 동일하게 본 발명에 병합될 수 있다.All documents, including published documents, patent applications, patents, etc., cited in the present invention may be incorporated into the present invention to the same extent as if each cited document were individually and specifically indicated to be incorporated or collectively indicated to be incorporated into the present invention as a whole.

본 발명의 이해를 위하여, 도면에 도시된 바람직한 실시 예들에서 참조 부호를 기재하였으며, 본 발명의 실시 예들을 설명하기 위하여 특정 용어들을 사용하였으나, 특정 용어에 의해 본 발명이 한정되는 것은 아니며, 본 발명은 당업자에 있어서 통상적으로 생각할 수 있는 모든 구성 요소들을 포함할 수 있다. For the purpose of understanding the present invention, reference numerals have been given to preferred embodiments illustrated in the drawings, and specific terms have been used to describe the embodiments of the present invention; however, the present invention is not limited by the specific terms, and the present invention may include all components that can be commonly conceived by those skilled in the art.

본 발명은 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 본 발명은 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 본 발명에의 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 발명은 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler), R, Python 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 발명은 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. "매커니즘", "요소", "수단", "구성"과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.The present invention may be represented by functional block configurations and various processing steps. These functional blocks may be implemented by a variety of hardware and/or software configurations that perform specific functions. For example, the present invention may employ direct circuit configurations such as memory, processing, logic, look-up tables, etc., which may perform various functions under the control of one or more microprocessors or other control devices. Similarly to the fact that the components of the present invention may be implemented by software programming or software components, the present invention may be implemented by programming or scripting languages such as C, C++, Java, assembler, R, Python, etc., including various algorithms implemented by a combination of data structures, processes, routines, or other programming configurations. The functional aspects may be implemented by algorithms that are executed on one or more processors. In addition, the present invention may employ conventional techniques for electronic environment setting, signal processing, and/or data processing. Terms such as "mechanism", "element", "means", and "configuration" may be used broadly and are not limited to mechanical and physical configurations. The terms may also include the meaning of a series of software processes (routines) in connection with a processor, etc.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, "필수적인", "중요하게" 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are only exemplary embodiments and do not limit the scope of the present invention in any way. For the sake of brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connections or lack of connections of lines between components illustrated in the drawings are merely exemplary of functional connections and/or physical or circuit connections, and may be replaced or represented as various additional functional connections, physical connections, or circuit connections in an actual device. In addition, if there is no specific mention such as “essential,” “important,” etc., the component may not be absolutely necessary for the application of the present invention.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.The use of the term "above" and similar referential terms in the specification of the present invention (especially in the claims) may be in both singular and plural. In addition, when a range is described in the present invention, it is intended to include inventions that apply individual values belonging to the range (unless otherwise stated), and it is the same as stating each individual value constituting the range in the detailed description of the invention. Finally, unless the order of the steps constituting the method according to the present invention is explicitly stated or stated to the contrary, the steps may be performed in any suitable order. The invention is not necessarily limited by the order in which the steps are described. The use of all examples or exemplary terms (e.g., etc.) in the present invention is merely intended to describe the invention in detail, and the scope of the invention is not limited by the examples or exemplary terms unless otherwise defined by the claims. In addition, those skilled in the art will recognize that various modifications, combinations, and alterations may be made according to design conditions and factors within the scope of the appended claims or their equivalents.

하기의 실시예를 통하여 본 발명을 보다 상세하게 설명한다. 그러나 하기 실시예는 본 발명의 내용을 구체화하기 위한 것일 뿐 이에 의해 본 발명이 한정되는 것은 아니다.The present invention will be described in more detail through the following examples. However, the following examples are only intended to concretize the content of the present invention and the present invention is not limited thereto.

실시예 1. 이중 타겟 siRNA 설계Example 1. Design of dual target siRNA

다양한 2종의 표적 유전자 (제 1 유전자 및 제 2 유전자)의 mRNA 서열을 추출한 뒤, 이에 상보적인 서열을 제작하고, 제 1 유전자는 mRNA 서열에 상보적인 서열의 3'에서 5' 방향으로 1 nt(nucleotide)씩 17~24mer 길이로 세그멘트화하고, 다른 제 2 유전자는 mRNA 서열에 상보적인 서열의 5'에서 3' 방향으로 1 nt씩 17~24mer 길이로 세그멘트화하였다. 세그멘트화한 두 유전자의 서열 각각에 대하여, Striped Smith-Waterman 알고리즘을 이용하여, NCBI에서 제공하는 약 37961개의 유전자 서열과 얼라이먼트(alignment)를 진행하였다. 이 후, 제 1 유전자의 mRNA에서 세그멘트화된 서열이 1) 시작 코돈(start condon)으로부터 75nt 거리 내에 위치하는 않는 경우, 2) G 염기 및 C 염기의 함량이 전체 염기 수에 대해서 36% 내지 52%인 경우, 3) GC 반복서열이 3 미만인 경우, 4) AT 반복 서열이 4 미만인 경우, 5) 세그멘트화된 서열이 센스 서열인 경우, G 염기 또는 C 염기가 첫 번째 위치에 존재하는 경우, 6) 세그멘트화된 서열이 센스 서열인 경우, A 염기가 3번째 위치에 존재하는 경우, 7) 세그멘트화된 서열이 센스 서열인 경우, T 또는 U 염기가 10번째 위치에 존재하는 경우, 8) 세그멘트화된 서열이 센스 서열인 경우, G 염기가 13번째 위치에 존재하지 않는 경우, 9) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 A 염기가 존재하는 경우, 10) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 G 염기 또는 C 염기가 존재하지 않는 경우, 11) 세그멘트화된 서열이 안티센스 서열인 경우, 첫 번째 위치에 A 염기, T 염기 또는 U 염기가 존재하지 않는 경우, 12) 세그멘트화된 서열이 안티센스 서열인 경우, 서열의 첫 번째 위치에 G 염기가 존재하는 경우, 13) 세그멘트화된 서열이 안티센스 서열인 경우, 6번째 위치에 A가 존재하는 경우, 14) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 G 염기 또는 C 염기가 아닌 경우, 15) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 U 염기 또는 T 염기인 경우, 및 16) 세그멘트화된 서열이 코딩서열(Coding sequence, CDS)에 위치하는 경우에 1점의 가중치를 주었고; 및 얼라이먼트되어 매칭된 서열도 상기 매칭된 서열이 타겟하는 제 2 유전자의 mRNA에서 세그멘트화된 서열이 1) 시작 코돈(start condon)으로부터 75bp 거리 내에 위치하는 않는 경우, 2) G 염기 및 C 염기의 함량이 전체 염기 수에 대해서 36% 내지 52%인 경우, 3) GC 반복서열이 3 미만인 경우, 4) AT 반복 서열이 4 미만인 경우, 5) 세그멘트화된 서열이 센스 서열인 경우, G 염기 또는 C 염기가 첫 번째 위치에 존재하는 경우, 6) 세그멘트화된 서열이 센스 서열인 경우, A 염기가 3번째 위치에 존재하는 경우, 7) 세그멘트화된 서열이 센스 서열인 경우, T 또는 U 염기가 10번째 위치에 존재하는 경우, 8) 세그멘트화된 서열이 센스 서열인 경우, G 염기가 13번째 위치에 존재하지 않는 경우, 9) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 A 염기가 존재하는 경우, 10) 세그멘트화된 서열이 센스 서열인 경우, 19번째 위치에 G 염기 또는 C 염기가 존재하지 않는 경우, 11) 세그멘트화된 서열이 안티센스 서열인 경우, 첫 번째 위치에 A 염기, T 염기 또는 U 염기가 존재하지 않는 경우, 12) 세그멘트화된 서열이 안티센스 서열인 경우, 서열의 첫 번째 위치에 G 염기가 존재하는 경우, 13) 세그멘트화된 서열이 안티센스 서열인 경우, 6번째 위치에 A가 존재하는 경우, 14) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 G 염기 또는 C 염기가 아닌 경우, 15) 세그멘트화된 서열이 안티센스 서열인 경우, 19번째 서열이 U 염기 또는 T 염기인 경우, 및 16) 세그멘트화된 서열이 코딩서열(Coding sequence, CDS)에 위치하는 경우에 1점의 가중치를 주었다. 이렇게 생성된 제 1 유전자 및 제 2 유전자의 mRNA 서열에 각각 상보적으로 결합하면서 서로 상보적인 17~24mer 길이의 이중 특이적 siRNA들을 99쌍을 제작하였다 (도 1). After extracting the mRNA sequences of two types of target genes (the first gene and the second gene), complementary sequences were created therefrom, and the first gene was segmented into 17 to 24 mer lengths of 1 nt (nucleotide) each in the 3' to 5' direction of the sequence complementary to the mRNA sequence, and the other second gene was segmented into 17 to 24 mer lengths of 1 nt each in the 5' to 3' direction of the sequence complementary to the mRNA sequence. For each of the two segmented gene sequences, alignment was performed with approximately 37,961 gene sequences provided by NCBI using the Striped Smith-Waterman algorithm. After this, if the segmented sequence in the mRNA of the first gene 1) is not located within a distance of 75 nt from the start codon, 2) if the content of G base and C base is 36% to 52% with respect to the total number of bases, 3) if the GC repeat sequence is less than 3, 4) if the AT repeat sequence is less than 4, 5) if the segmented sequence is a sense sequence, if the G base or C base is present at the first position, 6) if the segmented sequence is a sense sequence, if the A base is present at the third position, 7) if the segmented sequence is a sense sequence, if the T or U base is present at the 10th position, 8) if the segmented sequence is a sense sequence, if the G base is not present at the 13th position, 9) if the segmented sequence is a sense sequence, if the A base is present at the 19th position, 10) if the segmented sequence is a sense sequence A weighting of 1 point was given to the following cases: 1) if the segmented sequence is an antisense sequence and there is no A base, T base, or U base at the first position; 12) if the segmented sequence is an antisense sequence and there is a G base at the first position of the sequence; 13) if the segmented sequence is an antisense sequence and there is A at the 6th position; 14) if the segmented sequence is an antisense sequence and the 19th sequence is not a G base or a C base; 15) if the segmented sequence is an antisense sequence and the 19th sequence is a U base or a T base; and 16) if the segmented sequence is located in the coding sequence (CDS); And the aligned and matched sequence is also a sequence in which the segmented sequence in the mRNA of the second gene targeted by the matched sequence is 1) not located within a distance of 75 bp from the start codon, 2) if the content of G base and C base is 36% to 52% with respect to the total number of bases, 3) if the GC repeat sequence is less than 3, 4) if the AT repeat sequence is less than 4, 5) if the segmented sequence is a sense sequence, if the G base or C base is present at the first position, 6) if the segmented sequence is a sense sequence, if the A base is present at the third position, 7) if the segmented sequence is a sense sequence, if the T or U base is present at the 10th position, 8) if the segmented sequence is a sense sequence, if the G base is not present at the 13th position, 9) if the segmented sequence is a sense sequence, if the A base is present at the 19th position. 10) If the segmented sequence is a sense sequence and there is no G base or C base at the 19th position, 11) If the segmented sequence is an antisense sequence and there is no A base, T base, or U base at the first position, 12) If the segmented sequence is an antisense sequence and there is a G base at the first position of the sequence, 13) If the segmented sequence is an antisense sequence and there is A at the 6th position, 14) If the segmented sequence is an antisense sequence and the 19th sequence is not a G base or C base, 15) If the segmented sequence is an antisense sequence and the 19th sequence is a U base or T base, and 16) If the segmented sequence is located in a coding sequence (CDS), a weight of 1 point was given. 99 pairs of dual-specific siRNAs of 17-24 mer length that are complementary to each other while complementarily binding to the mRNA sequences of the first and second genes thus generated were produced (Fig. 1).

실시예 2. 미스 매치를 포함하는 이중 타겟 siRNA 설계Example 2. Design of dual-target siRNAs containing mismatches

RISC는 일반적으로 siRNA 두 가닥 중 한 가닥만 선호하므로, 두 종류의 유전자의 mRNA를 적절하게 하향 조절하기 위해서는 두 서열이 비슷한 비율로 RISC에 결합되어야 하는데, 종래의 이중 타겟 siRNA는 각 유전자의 mRNA 서열에 상보적인 핵산 서열인 센스(sense) 서열, 즉, guide RNA (제 1 유전자의 mRNA에 상보적인 핵산 서열)와 안티센스(antisense) 서열, 즉, passenger RNA (제 2 유전자의 mRNA에 상보적인 핵산 서열)의 각 염기가 100% 상보적이기 때문에 off-target effect가 발생하였다. 이에, 미스매치의 수 및 위치가 RISC가 두 가닥을 비슷한 비율로 인식하는데 중요할 것으로 판단하여, 오프 타겟 효과(off-target effect)를 제어하면서 두 종류의 유전자를 모두 낙다운하기 위해, siRNA의 guide RNA 및 passenger RNA 서열 사이의 다양한 위치에 미스매치(mismatch)가 존재하도록 siRNA를 설계한 뒤 (도 2), mismatch 위치와 해당 서열의 각 유전자 발현 억제 효과의 상관관계를 분석하였다. 구체적으로, 상기 실시예 1에서 다양한 유전자 세트에 대해 도출한 길이 17~24mer 사이의 이중 타겟 siRNA 99쌍 (총 198개의 핵산 서열)에서 두 서열간 mismatch의 위치를 '전체 서열 길이 대비 위치(mismatch location)'로 나타내 이들의 서로 다른 길이를 표준화하고 (도 3a), 총 198개 서열의 mismatch location 및 억제 효율 (KD_eff)을 데이터화하였다 (도 3b). 이 때, Mismatch location을 0.05 단위로 구간화하였고 (e.g 0.00~0.05, 0.05~0.10, …0.90~0.95, 0.95~1.00), 각 구간별 mismatch 유무에 따른 억제 효율 차이를 분석하기 위해, mismatch가 있는 서열의 억제 효율 상위 30% 및 하위 30%, 및 mismatch가 없는 서열의 억제 효율 상위 30% 및 하위 30%의 억제효율 차이를 수학식 1 및 2를 이용하여 계산하였다. 하기 수학식 1 및 2에서 mismatch의 효과가 없다면 상위 30%의 결과와 하위 30%의 결과는 1이 되며, 차이가 있다면 1이 나오지 않으므로, 하기 수학식 3을 이용하여, 상기 두 수학식의 결과값의 차이 (Mismatch location에 따른 상대적 KD_eff 차이)를 분석하여 (Mismatch가 두 유전자의 발현 억제 효율을 향상시키면 양수의 값/Mismatch가 두 유전자의 발현 억제 효율을 감소시키면 음수의 값), 99쌍의 siRNA (총 198개 서열)의 mismatch location이 bispecific siRNA의 유전자 발현 억제 효율에 미치는 영향을 분석하였다.Since RISC generally prefers only one of the two siRNA strands, the two sequences must bind to RISC at a similar ratio in order to appropriately down-regulate the mRNAs of two types of genes. However, in the case of conventional dual-target siRNA, the sense sequence, which is a nucleic acid sequence complementary to the mRNA sequences of each gene, i.e., guide RNA (a nucleic acid sequence complementary to the mRNA of the first gene), and the antisense sequence, i.e., passenger RNA (a nucleic acid sequence complementary to the mRNA of the second gene), are 100% complementary in each base, which causes off-target effects. Here, we designed siRNAs to have mismatches at various locations between the guide RNA and passenger RNA sequences in order to control the off-target effect and knock down both types of genes, considering that the number and location of mismatches would be important for RISC to recognize the two strands at a similar ratio (Fig. 2), and then analyzed the correlation between the mismatch locations and the gene expression inhibition effect of the corresponding sequences. Specifically, in 99 pairs of dual target siRNAs (a total of 198 nucleic acid sequences) with lengths ranging from 17 to 24 mer derived for various gene sets in Example 1, the locations of the mismatches between the two sequences were expressed as 'locations relative to the total sequence length (mismatch location)' to standardize their different lengths (Fig. 3a), and the mismatch locations and inhibition efficiencies ( _KDeff ) of a total of 198 sequences were datafied (Fig. 3b). At this time, the mismatch location was divided into intervals of 0.05 (eg 0.00~0.05, 0.05~0.10, …0.90~0.95, 0.95~1.00), and in order to analyze the difference in suppression efficiency according to the presence or absence of mismatch in each interval, the difference in suppression efficiency between the top 30% and bottom 30% of the suppression efficiency of the sequence with mismatch and the top 30% and bottom 30% of the suppression efficiency of the sequence without mismatch was calculated using mathematical equations 1 and 2. In the following mathematical expressions 1 and 2, if there is no effect of mismatch, the results of the upper 30% and the lower 30% are 1, and if there is a difference, 1 is not output, so the difference in the results of the two mathematical expressions above (the relative KD _eff difference according to the mismatch location) was analyzed using the following mathematical expression 3 (a positive value if the mismatch enhances the expression inhibition efficiency of two genes/a negative value if the mismatch decreases the expression inhibition efficiency of two genes), and the effect of the mismatch location of 99 pairs of siRNAs (a total of 198 sequences) on the gene expression inhibition efficiency of bispecific siRNA was analyzed.

그 결과, siRNA 쌍의 단일 가닥 서열을 기준으로, siRNA의 총 길이를 1.0으로 가정했을 때, 5' 말단으로부터 0.25~0.30 및 0.70~0.75의 위치의 뉴클레오타이드에 mismatch가 존재할 경우 (예를 들어, 20mer siRNA인 경우, 5'말단에서 14번째 뉴클레오타이드 위치에 mismatch가 존재), 유전자 억제 효율이 증가되는 것으로 나타났다 (도 4). 또한, 비대칭 mismatch(불일치) 구조가 siRNA 효능의 선택성을 높이는 것으로 나타났으며, 0.3 위치에 불일치가 있는 단일 가닥 siRNA가 작업 서열로 선택되는 경향이 있고, 0.7 위치에 상보적인 불일치가 있는 다른 가닥이 선택되는 경향이 나타났다 (도 5). 결과적으로 0.3 위치가 siRNA 효능에 가장 중요한 부위임을 확인하였다.As a result, based on the single-stranded sequence of the siRNA pair, when the total length of the siRNA is assumed to be 1.0, when mismatches exist at nucleotides at positions 0.25 to 0.30 and 0.70 to 0.75 from the 5' end (for example, in the case of 20mer siRNA, a mismatch exists at the 14th nucleotide position from the 5' end), the gene silencing efficiency was found to increase (Fig. 4). In addition, it was found that an asymmetric mismatch structure increases the selectivity of siRNA efficacy, and a single-stranded siRNA with a mismatch at position 0.3 tended to be selected as the working sequence, and the other strand with a complementary mismatch at position 0.7 tended to be selected (Fig. 5). As a result, it was confirmed that position 0.3 is the most important site for siRNA efficacy.

Claims

1) a first nucleic acid molecule targeting the mRNA of the first gene; and
2) In a dual target nucleic acid molecule comprising a second nucleic acid molecule targeting the mRNA of a second gene,
The first nucleic acid molecule and the second nucleic acid molecule are partially complementary, and
A dual target nucleic acid molecule, characterized in that base pairs of the first nucleic acid molecule and the second nucleic acid molecule are mismatched at a position corresponding to 0.25 to 0.40, or a position corresponding to 0.60 to 0.75 from the 5' end or the 3' end of the first nucleic acid molecule or the second nucleic acid molecule with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule.

A dual target nucleic acid molecule in claim 1, wherein the first nucleic acid molecule has a sequence that is 100% complementary to the mRNA of the first gene, and the second nucleic acid molecule has a sequence that is 100% complementary to the mRNA of the second gene.

In the first paragraph, a dual target nucleic acid molecule, wherein the first nucleic acid molecule and the second nucleic acid molecule have 1 to 3 base pairs of mismatch.

A dual target nucleic acid molecule in claim 1, wherein the first nucleic acid molecule or the second nucleic acid molecule is 14 to 30 mer.

A dual target nucleic acid molecule in claim 4, wherein the first nucleic acid molecule or the second nucleic acid molecule is 17 to 24 mer.

A dual target nucleic acid molecule in claim 1, wherein the dual target nucleic acid molecule is siRNA or shRNA.

In the first paragraph, if the dual target nucleic acid molecule is 20 mer,
1) The base at the 7th position from the 5' end of the first nucleic acid molecule and the base at the 14th position from the 5' end of the second nucleic acid molecule are mismatched;
2) the base at the 9th position from the 5' end of the first nucleic acid molecule and the base at the 12th position from the 5' end of the second nucleic acid molecule are mismatched; and
3) A dual target nucleic acid molecule in which the base at the 14th position from the 5' end of the first nucleic acid molecule and the base at the 7th position from the 5' end of the second nucleic acid molecule are mismatched.

In a database of malformed dual target nucleic acid molecules,
A method for designing a dual target nucleic acid molecule, comprising the step of selecting a base pair mismatch between a first nucleic acid molecule and a second nucleic acid molecule at a position corresponding to 0.25 to 0.40 or a position corresponding to 0.60 to 0.75 from the 5' or 3' end of the first nucleic acid molecule or the second nucleic acid molecule with respect to a total nucleotide (nt) length of 1.00 of the first nucleic acid molecule or the second nucleic acid molecule of the dual target nucleic acid molecule.

A method for designing a dual target nucleic acid molecule in claim 8, wherein the first nucleic acid molecule has a sequence that is 100% complementary to the mRNA of the first gene, and the second nucleic acid molecule has a sequence that is 100% complementary to the mRNA of the second gene.

A method for designing a dual target nucleic acid molecule, wherein the first nucleic acid molecule and the second nucleic acid molecule have 1 to 3 base pairs of mismatch in the 8th paragraph.

A method for designing a dual target nucleic acid molecule, wherein the first nucleic acid molecule or the second nucleic acid molecule is 14 to 30 mer in the 8th paragraph.

A method for designing a dual target nucleic acid molecule, wherein the first nucleic acid molecule or the second nucleic acid molecule is 17 to 24 mer in the 11th paragraph.

A method for designing a dual target nucleic acid molecule, wherein the dual target nucleic acid molecule in claim 8 is siRNA or shRNA.

In the 8th paragraph, if the dual target nucleic acid molecule is 20mer,
1) The base at the 7th position from the 5' end of the first nucleic acid molecule and the base at the 14th position from the 5' end of the second nucleic acid molecule are mismatched;
2) the base at the 9th position from the 5' end of the first nucleic acid molecule and the base at the 12th position from the 5' end of the second nucleic acid molecule are mismatched; and
3) A method for designing a dual target nucleic acid molecule in which the base at the 14th position from the 5' end of a first nucleic acid molecule and the base at the 7th position from the 5' end of a second nucleic acid molecule are mismatched.

In the 8th paragraph, the creation of the dual target nucleic acid molecule database comprises:
A segmentation step is performed to generate sequence information by segmenting the sequence information from the 5' end of the mRNA of the first gene into 15 to 30mer lengths.
The above segmentation step is,
A method for designing a dual target nucleic acid molecule, characterized in that sequence information is generated through multiple segmentation steps of 15 to 30 mer in length, starting from points sequentially spaced by 1 nt (nucleotide) in the 3' direction from the 5' end of the mRNA.

In the 8th paragraph, the creation of the dual target nucleic acid molecule database comprises:
A segmentation step is performed to generate sequence information by segmenting the sequence information from the 3' end of the mRNA of the second gene into 15 to 30mer lengths.
The above segmentation step is,
A method for designing a dual target nucleic acid molecule, characterized in that sequence information is generated through multiple segmentation steps of 15 to 30 mer in length, starting from points sequentially spaced by 1 nt in the 5' direction from the 3' end of the mRNA.

A method for designing a dual target nucleic acid molecule, wherein the sequence information of the segmentation step of claim 15 or 16 is generated to have a length of 17 to 21 mer.

In the 8th paragraph, the creation of the dual target nucleic acid molecule database comprises:
(1) A segmentation step is performed to generate sequence information by segmenting the sequence information from the 5' end of the mRNA of the first gene into segments of 15 to 30 mer in length.
The above segmentation step is,
Sequence information is generated through multiple segmentation steps of 15 to 30 mer length, starting from points sequentially spaced by 1 nt (nucleotide) in the 3' direction from the 5' end of the mRNA;
(2) A segmentation step is performed to generate sequence information by segmenting the sequence information from the 3' end of the mRNA of the second gene into 15 to 30mer lengths.
The above segmentation step is,
Sequence information is generated through multiple segmentation steps of 15 to 30 mer length, starting from a point sequentially spaced by 1 nt in the 5' direction from the 3' end of the mRNA; and
(3) A method for designing a dual target nucleic acid molecule, comprising a step of aligning sequence information generated from mRNA of the first gene and sequence information generated from mRNA of the second gene.

A method for designing a dual target nucleic acid molecule, further comprising a step of scoring matched sequences after the aligning step in claim 18.