JP2023130393A

JP2023130393A - Water-soluble transmembrane proteins and methods for their preparation and use

Info

Publication number: JP2023130393A
Application number: JP2023104641A
Authority: JP
Inventors: チャン，シュウガン; Shuguang Zhang; タオ，フェイ; Fei Tao
Original assignee: Massachusetts Institute of Technology
Current assignee: Massachusetts Institute of Technology
Priority date: 2015-02-18
Filing date: 2023-06-27
Publication date: 2023-09-20
Also published as: JP2025060920A; JP2020143063A; CN106459174A; JP2022037160A; JP7061461B2; JP2017516492A; CN106459174B; CN113929766A

Abstract

【課題】水溶性膜タンパク質、その調製方法、及び、その使用方法を提供する。
【解決手段】Ｇタンパク質共役受容体（ＧＰＣＲ）の水溶性変異体を選択する手順を実行するためのコンピュータ実装方法であって、前記ＧＰＣＲの配列を入力する工程と、前記ＧＰＣＲの変異体を得る工程と、その後に前記変異体のためにαヘリックス二次構造結果を得て、前記変異体内のαヘリックス二次構造の維持を確認する工程と、前記変異体のために膜貫通領域結果を得て前記変異体の水溶性を確認する工程と、を含み、それにより前記ＧＰＣＲの水溶性変異体を選択することを特徴とする方法である。
【選択図】なしThe present invention provides a water-soluble membrane protein, a method for preparing the same, and a method for using the same.
A computer-implemented method for selecting a water-soluble variant of a G protein-coupled receptor (GPCR), comprising inputting a sequence of the GPCR and obtaining the variant of the GPCR. and then obtaining α-helical secondary structure results for said mutant to confirm maintenance of α-helical secondary structure within said mutant; and obtaining transmembrane region results for said mutant. and confirming the water solubility of the mutant by using the method, thereby selecting a water-soluble mutant of the GPCR.
[Selection diagram] None

Description

（関連出願）
本出願は、どちらも２０１５年３月２６日に出願された米国特許出願第１４／６６９，７５３号および国際出願ＰＣＴ／ＵＳ第２０１５／０２２７８０号の優先権を主張するものであり、その両方が３５Ｕ．Ｓ．Ｃ．１１９（ｅ）の下で、２０１５年２月１８日に出願された米国仮出願第６２／１１７，５５０号、２０１４年５月１５日に出願された米国仮出願第６１／９９３，７８３号および２０１４年３月２７日に出願された米国仮出願第６１／９７１，３８８号の出願日の利益を主張する。 (Related application)
This application claims priority to U.S. Patent Application No. 14/669,753 and International Application No. PCT/US 2015/022780, both filed on March 26, 2015, both of which 35U. S. C. 119(e), U.S. Provisional Application No. 62/117,550, filed February 18, 2015, U.S. Provisional Application No. 61/993,783, filed May 15, 2014, and Claims the filing date benefit of U.S. Provisional Application No. 61/971,388, filed March 27, 2014.

また、本出願は３５Ｕ．Ｓ．Ｃ．１１９（ｅ）の下で、２０１５年２月１８日に出願された米国仮出願第６２／１１７，５５０号、２０１４年５月１５日に出願された米国仮出願第６１／９９３，７８３号および２０１４年３月２７日に出願された米国仮出願第６１／９７１，３８８号の出願日の利益も主張する。 Additionally, this application is filed under 35 U.S.C. S. C. 119(e), U.S. Provisional Application No. 62/117,550, filed on February 18, 2015, U.S. Provisional Application No. 61/993,783, filed on May 15, 2014, and We also claim the benefit of the filing date of U.S. Provisional Application No. 61/971,388, filed March 27, 2014.

図面および配列表の全てを含む上記参照出願のそれぞれの内容全体が、参照により本明細書に組み込まれる。 The entire contents of each of the above referenced applications, including all drawings and sequence listings, are incorporated herein by reference.

膜タンパク質は全ての生体系において重量な役割を担っている。配列決定されているゲノムのほぼ全てにおける全ての遺伝子のおよそ約３０％が膜タンパク質をコードしている。しかし、本発明者らのそれらの構造および機能の詳細な理解は可溶性タンパク質の理解よりも大きく後れを取っている。２０１５年３月の時点で、タンパク質構造データバンクには１００，０００種を超える構造が登録されている。しかし、２８種のＧタンパク質共役受容体を含み、かつテトラスパニン膜タンパク質を含まない５３０種の固有の構造を有する９４５種の膜タンパク質構造しか登録されていない。 Membrane proteins play important roles in all biological systems. Approximately about 30% of all genes in nearly all sequenced genomes encode membrane proteins. However, our detailed understanding of their structure and function lags far behind that of soluble proteins. As of March 2015, over 100,000 structures have been registered in the Protein Structure Data Bank. However, only 945 membrane protein structures have been registered, including 28 types of G protein-coupled receptors and 530 unique structures excluding tetraspanin membrane proteins.

膜受容体は非常に重要であるが、膜受容体の構造および機能ならびにそれらの認識およびリガンド結合特性を解明するにはいくつかの障害がある。最も重大かつ困難な課題は、ミリグラム量の可溶かつ安定な受容体を産生することが極めて難しいという点にある。安価な大規模産生方法が切実に求められており、従って、広範囲な研究の焦点となっている。これらの先行する障害を克服した場合にのみ詳細な構造研究を行うことができる。 Although membrane receptors are of great importance, there are several obstacles to elucidating their structure and function and their recognition and ligand binding properties. The most important and difficult challenge is that it is extremely difficult to produce milligram quantities of soluble and stable receptors. Inexpensive large-scale production methods are desperately needed and have therefore been the focus of extensive research. Detailed structural studies can only be carried out if these preceding obstacles are overcome.

Ｚｈａｎｇら（米国特許第８，６３７，４５２号）（参照により本明細書に組み込まれる）は、膜貫通領域内に位置する特定の疎水性アミノ酸が極性アミノ酸によって置換された水溶性ＧＰＣＲのための改良された方法について記載している。しかし、この方法は大きな労働力を要する。さらに、その修飾された膜貫通領域は水溶性判断基準を満たしているが、水溶性およびリガンド結合の改善が望まれている。従って、当該技術分野において、Ｇタンパク質共役受容体の改良された研究方法が必要とされている。 Zhang et al. (U.S. Pat. No. 8,637,452) (incorporated herein by reference) developed a method for water-soluble GPCRs in which certain hydrophobic amino acids located within the transmembrane region were replaced by polar amino acids. An improved method is described. However, this method requires a large amount of labor. Furthermore, although the modified transmembrane region meets water solubility criteria, improved water solubility and ligand binding are desired. Accordingly, there is a need in the art for improved methods for studying G-protein coupled receptors.

本発明は、水溶性膜タンパク質およびペプチドの設計、選択および／または産生方法、そこから設計、選択または産生されたペプチド（および膜貫通ドメイン）、前記ペプチドを含む組成物ならびにその使用方法に関する。特に、本方法は、水不溶性アミノ酸（ロイシン、イソロイシン、バリンおよびフェニルアラニンまたは単純な文字コードＬ、Ｉ、Ｖ、Ｆ）を非イオン性の水溶性アミノ酸（グルタミン、トレオニンおよびチロシンまたは単純な文字コードＱ、Ｔ、Ｙ）に変更する「ＱＴＹ原理」を用いる、ＧＰＣＲ変異体およびテトラスパニン膜タンパク質などの水溶性膜ペプチドのライブラリーの設計方法に関する。さらに、Ｆ以外のＬ、ＩおよびＶの置換のために２種類のさらなる非イオン性アミノ酸アスパラギン（Ｎ）およびセリン（Ｓ）を使用してもよい。以下で考察する実施形態では、アスパラギン（Ｎ）およびセリン（Ｓ）は、ＱおよびＴ（変異体として記載されている）あるいはＬ、ＩまたはＶ（天然タンパク質として記載されている）のために置換可能なものとして想定されていると理解されるべきである。但し簡潔性のために、これらの他の実施形態の詳細は本明細書中の教示により当業者には知られているため、本出願ではそれらについてさらに詳述しない。 The present invention relates to methods for designing, selecting and/or producing water-soluble membrane proteins and peptides, peptides (and transmembrane domains) designed, selected or produced therefrom, compositions containing said peptides and methods for their use. In particular, the method combines water-insoluble amino acids (leucine, isoleucine, valine and phenylalanine or simple letter code L, I, V, F) with non-ionic water-soluble amino acids (glutamine, threonine and tyrosine or simple letter code Q). , T, Y) using the "QTY principle" to design a library of water-soluble membrane peptides such as GPCR mutants and tetraspanin membrane proteins. Furthermore, two additional nonionic amino acids asparagine (N) and serine (S) may be used for substitution of L, I and V other than F. In embodiments discussed below, asparagine (N) and serine (S) are substituted for Q and T (described as a variant) or L, I or V (described as a native protein). It should be understood that this is assumed to be possible. However, for the sake of brevity, the details of these other embodiments will not be discussed in further detail in this application, as they are known to those skilled in the art from the teachings herein.

本発明は、修飾された、合成および／または天然に生じない１つ以上のαヘリックスドメインおよびそのような修飾された１つ以上のαヘリックスドメインを含む水溶性ポリペプチド（例えば、「ｓＧＰＣＲ」）を包含し、ここでは、修飾された１つ以上のαヘリックスドメインは、天然膜タンパク質のαヘリックスドメイン内の複数の疎水性アミノ酸残基（Ｌ、Ｉ、Ｖ、Ｆ）が非イオン性の親水性アミノ酸残基（それぞれＱ、Ｔ、Ｔ、Ｙまたは「Ｑ、Ｔ、Ｙ」）および／またはＮおよびＳで置換されているアミノ酸配列を含む。本発明は、天然膜タンパク質の１つ以上のαヘリックスドメイン内の複数の疎水性アミノ酸残基（Ｌ、Ｉ、Ｖ、Ｆ）を非イオン性の親水性アミノ酸残基（Ｑ／Ｎ／Ｓ、Ｔ／Ｎ／Ｓ、Ｙ）で置換する工程を含む水溶性ポリペプチドの調製方法も包含する。本発明は、天然膜タンパク質のαヘリックスドメイン内の複数の疎水性アミノ酸残基（Ｌ、Ｉ、Ｖ、Ｆ）を非イオン性の親水性アミノ酸残基（それぞれＱ／Ｎ／Ｓ、Ｔ／Ｎ／Ｓ、Ｙ）で置換して調製されたポリペプチドをさらに包含する。その変異体は、親すなわち天然タンパク質（例えば、ＣＸＣＲ４）の後に略語「ＱＴＹ」が続く名前（例えば、ＣＸＣＲ４－ＱＴＹ）によって特徴づけることができる。 The present invention provides modified, synthetic and/or non-naturally occurring one or more alpha-helical domains and water-soluble polypeptides (e.g., "sGPCRs") comprising one or more such modified alpha-helical domains. , wherein the modified one or more α-helical domains include multiple hydrophobic amino acid residues (L, I, V, F) within the α-helical domain of the native membrane protein that are nonionic and hydrophilic. (Q, T, T, Y or "Q, T, Y" respectively) and/or amino acid sequences substituted with N and S. The present invention replaces multiple hydrophobic amino acid residues (L, I, V, F) within one or more α-helical domains of a natural membrane protein with nonionic hydrophilic amino acid residues (Q/N/S, Also included is a method for preparing a water-soluble polypeptide comprising the step of substituting with T/N/S, Y). The present invention replaces multiple hydrophobic amino acid residues (L, I, V, F) within the α-helical domain of natural membrane proteins with nonionic hydrophilic amino acid residues (Q/N/S, T/N, respectively). /S, Y) is further included. The variant can be characterized by the name of the parent or native protein (eg, CXCR4) followed by the abbreviation "QTY" (eg, CXCR4-QTY).

従って、本発明の一態様は、（１）分析のために膜タンパク質（例えば、ＧＰＣＲ）の配列を入力する工程と、（２）膜タンパク質（例えば、ＧＰＣＲ）の膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）内の複数の疎水性アミノ酸が置換されている膜タンパク質（例えば、ＧＰＣＲ）の変異体を得る工程であって、（ａ）前記疎水性アミノ酸は、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択され、（ｂ）前記ロイシン（Ｌ）はそれぞれ独立して、グルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、（ｃ）前記イソロイシン（Ｉ）および前記バリン（Ｖ）はそれぞれ独立して、トレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、かつ（ｄ）前記フェニルアラニンはそれぞれチロシン（Ｙ）で置換される工程と、その後に、（３）当該変異体のためにαヘリックス二次構造結果を得て、当該変異体内のαヘリックス二次構造の維持を確認する工程と、（４）当該変異体のために膜貫通領域結果を得て、当該変異体の水溶性を確認する工程とを含み、それにより膜タンパク質（例えば、ＧＰＣＲ）の水溶性変異体を設計することを特徴とする、膜タンパク質（例えば、Ｇタンパク質共役受容体（ＧＰＣＲ））の水溶性変異体を設計するためのスクリプト化手順を実行するためにコンピュータプログラムを動作させる方法を提供する。 Accordingly, one aspect of the present invention includes (1) inputting the sequence of a membrane protein (e.g., GPCR) for analysis; and (2) transmembrane (TM) domain α-helix of the membrane protein (e.g., GPCR). Obtaining a variant of a membrane protein (e.g., GPCR) in which a plurality of hydrophobic amino acids within a segment (“TM region”) have been substituted, the hydrophobic amino acids being (a) leucine (L), selected from the group consisting of isoleucine (I), valine (V) and phenylalanine (F), (b) said leucine (L) is each independently selected from glutamine (Q), asparagine (N) or serine (S); (c) said isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S), and (d) said phenylalanine is each tyrosine. (Y), followed by (3) obtaining α-helical secondary structure results for the mutant to confirm maintenance of α-helical secondary structure within the mutant; 4) obtaining transmembrane region results for the variant and confirming the water solubility of the variant, thereby designing a water-soluble variant of a membrane protein (e.g., a GPCR). A method of operating a computer program to perform a scripted procedure for designing a water-soluble variant of a membrane protein (e.g., a G protein-coupled receptor (GPCR)) is provided.

特定の実施形態では、工程（４）の前、それと同時またはその後に工程（３）を行う。本明細書に記載されているさらなる工程を上記処理手順に組み込むことができる。処理はデータ処理システムによって予め形成された計算工程を使用することが好ましい。本システムは、自動計算システムおよびタンパク質変異体の選択方法を利用する。 In certain embodiments, step (3) is performed before, simultaneously with, or after step (4). Further steps described herein can be incorporated into the above procedure. Preferably, the processing uses predefined calculation steps by the data processing system. The system utilizes automated computational systems and protein variant selection methods.

特定の実施形態では、工程（２）において、当該ＧＰＣＲの１つの同じＴＭ領域内の前記複数の疎水性アミノ酸の１つのサブセットを置換して変異体候補ライブラリーの１種のメンバーを作製し、かつ前記複数の疎水性アミノ酸の１つ以上の異なるサブセットを置換して当該ライブラリーのさらなるメンバーを作製する。特定の実施形態では、本方法は、前記ライブラリーの全てのメンバーを組み合わせスコアに基づいてランク付けする工程をさらに含んでもよく、組み合わせスコアは、αヘリックス二次構造予測結果および膜貫通領域予測結果の重み付けされた組み合わせである。特定の実施形態では、本方法は、ランク付け関数を用いて当該変異体をランク付けする工程をさらに含む。特定の実施形態では、ランク付け関数は、二次構造成分および水溶性成分を含んでもよい。例えば、ランク付け関数は、二次構造成分および／または水溶性成分の重み付け値を含んでもよい。特定の実施形態では、本方法は、データプロセッサにより本方法を行う工程をさらに含み、データプロセッサはそこに接続されたメモリをさらに備えていてもよい。 In certain embodiments, in step (2), substituting a subset of said plurality of hydrophobic amino acids in one and the same TM region of said GPCR to generate one member of a candidate variant library; and substituting one or more different subsets of the plurality of hydrophobic amino acids to create additional members of the library. In certain embodiments, the method may further include ranking all members of said library based on a combined score, where the combined score is determined by the α-helical secondary structure prediction result and the transmembrane region prediction result. is a weighted combination of In certain embodiments, the method further comprises ranking the variants using a ranking function. In certain embodiments, the ranking function may include secondary structure components and water-soluble components. For example, the ranking function may include weighting values for secondary structure components and/or water-soluble components. In certain embodiments, the method further includes performing the method by a data processor, which may further include a memory coupled thereto.

特定の実施形態では、本方法は、最も高い組み合わせスコアを有するＮ種のメンバーを選択して前記ＴＭ領域のための変異体候補の第１のライブラリーを形成する工程をさらに含んでもよく、ここで、Ｎは所定の整数（例えば、３、４、５、６、７、８、９、１０、１１、１２、１３、１４、１５、１６、１７、１８、１９、２０またはそれ以上）である。特定の実施形態では、本方法は、当該ＧＰＣＲの１、２、３、４、５または６つ全ての他のＴＭ領域のための変異体候補の１つのライブラリーを作製する工程をさらに含んでもよい。特定の実施形態では、本方法は、当該ＧＰＣＲの２つ以上のＴＭ領域を変異体候補ライブラリー内の対応するＴＭ領域で置換して組み合わせ変異体ライブラリーを作製する工程をさらに含んでもよい。特定の実施形態では、本方法は、前記組み合わせ変異体を産生／発現させる工程をさらに含む。特定の実施形態では、本方法は、前記組み合わせ変異体をリガンド結合について（例えば、酵母ツーハイブリッド法で）試験する工程をさらに含み、ここでは、当該ＧＰＣＲと比較して実質的に同じリガンド結合を有するものを選択する。特定の実施形態では、本方法は、前記組み合わせ変異体を当該ＧＰＣＲの生物学的機能について試験する工程をさらに含み、ここでは、当該ＧＰＣＲと比較して実質的に同じ生物学的機能を有するものを選択する。 In certain embodiments, the method may further include selecting members of the N species with the highest combination scores to form a first library of variant candidates for the TM region, wherein: where N is a predetermined integer (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more). be. In certain embodiments, the method may further comprise generating a library of candidate variants for one, two, three, four, five or all six other TM regions of the GPCR. good. In certain embodiments, the method may further include replacing two or more TM regions of the GPCR with corresponding TM regions in a candidate variant library to create a combinatorial variant library. In certain embodiments, the method further comprises producing/expressing said combination variant. In certain embodiments, the method further comprises testing said combination variant for ligand binding (e.g., in a yeast two-hybrid assay), wherein said combination variant exhibits substantially the same ligand binding as compared to said GPCR. Choose what you have. In certain embodiments, the method further comprises testing said combination variant for biological function of said GPCR, wherein said combination variant has substantially the same biological function as compared to said GPCR. Select.

本発明の特定の水溶性ポリペプチドは、野生型または天然膜タンパク質（例えば、ＧＰＣＲ）に正常に結合するリガンドに結合する能力を有する。特定の実施形態では、天然膜タンパク質（例えば、ＧＰＣＲ）のリガンド結合部位候補内のアミノ酸は置換されておらず、かつ／または当該天然膜タンパク質（例えば、ＧＰＣＲ）の細胞外および／または細胞内ドメインの配列は同一である。 Certain water-soluble polypeptides of the invention have the ability to bind ligands that normally bind to wild-type or native membrane proteins (eg, GPCRs). In certain embodiments, amino acids within the candidate ligand binding site of a natural membrane protein (e.g., a GPCR) are unsubstituted and/or the extracellular and/or intracellular domains of the natural membrane protein (e.g., a GPCR) are unsubstituted. The sequences of are the same.

（非イオン性）親水性残基（これは天然膜タンパク質のαヘリックスドメイン内の１つ以上の疎水性残基を置換する）は、グルタミン（Ｑ）、トレオニン（Ｔ）、チロシン（Ｙ）、アスパラギン（Ｎ）およびセリン（Ｓ）ならびにそれらの任意の組み合わせからなる群から選択される。さらなる態様では、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）から選択される疎水性残基は置換されている。特定の実施形態では、当該タンパク質のαヘリックスドメインのフェニルアラニン残基はチロシンで置換されており、当該タンパク質のαヘリックスドメインのイソロイシンおよび／またはバリン残基はそれぞれ独立してトレオニン（あるいはＳまたはＮ）で置換されており、かつ／または当該タンパク質のαヘリックスドメインのロイシン残基はそれぞれ独立してグルタミン（あるいはＳまたはＮ）で置換されている。 The (non-ionic) hydrophilic residues (which replace one or more hydrophobic residues in the α-helical domain of natural membrane proteins) include glutamine (Q), threonine (T), tyrosine (Y), selected from the group consisting of asparagine (N) and serine (S) and any combination thereof. In a further aspect, hydrophobic residues selected from leucine (L), isoleucine (I), valine (V) and phenylalanine (F) are substituted. In certain embodiments, a phenylalanine residue in the alpha-helical domain of the protein is replaced with a tyrosine, and each isoleucine and/or valine residue in the alpha-helical domain of the protein is independently replaced with a threonine (or S or N). and/or each leucine residue in the α-helical domain of the protein is independently replaced with glutamine (or S or N).

特定の実施形態では、前記ロイシンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％がグルタミンで置換されている。特定の実施形態では、前記イソロイシの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％ンがトレオニンで置換されている。特定の実施形態では、前記バリンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％がトレオニンで置換されている。特定の実施形態では、前記フェニルアラニンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％がチロシンで置換されている。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記ロイシンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記イソロイシンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記バリンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記フェニルアラニンは置換されていない。 In certain embodiments, substantially all of the leucine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% substituted with glutamine. In certain embodiments, substantially all of the isoleuci (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% are substituted with threonine. In certain embodiments, substantially all of the valine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% substituted with threonine. In certain embodiments, substantially all of the phenylalanine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% substituted with tyrosine. In certain embodiments, one or more (eg, 1, 2 or 3) of the leucines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the isoleucines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the valines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the phenylalanines are unsubstituted.

特定の実施形態では、組み合わせ変異体ライブラリーは、約２百万種未満のメンバーを含む。特定の実施形態では、当該ＧＰＣＲの配列は、当該ＧＰＣＲのＴＭ領域に関する情報を含む。特定の実施形態では、当該ＧＰＣＲの配列はタンパク質構造データベース（例えば、ＰＤＢ、ＵｎｉＰｒｏｔ）から得られる。特定の実施形態では、当該ＧＰＣＲのＴＭ領域を当該ＧＰＣＲの配列に基づいて予測する。例えば、ＴＭＨＭＭ２．０（隠れマルコフモデルを用いた膜貫通予測）ソフトウェアモジュール／パッケージを用いて当該ＧＰＣＲのＴＭ領域を予測することができる。特定の実施形態では、ＴＭＨＭＭ２．０ソフトウェアモジュール／パッケージはピーク探索のために動的ベースラインを利用する。 In certain embodiments, the combinatorial variant library contains less than about 2 million members. In certain embodiments, the sequence of the GPCR includes information regarding the TM region of the GPCR. In certain embodiments, the sequence of the GPCR is obtained from a protein structure database (eg, PDB, UniProt). In certain embodiments, the TM region of the GPCR is predicted based on the sequence of the GPCR. For example, the TMHMM2.0 (Transmembrane Prediction Using Hidden Markov Models) software module/package can be used to predict the TM region of the GPCR. In certain embodiments, the TMHMM2.0 software module/package utilizes dynamic baselines for peak searching.

特定の実施形態では、本方法は、当該ＧＰＣＲの各変異体のためにポリヌクレオチド配列を提供する工程をさらに含む。当該ポリヌクレオチド配列は、宿主（例えば、大腸菌などの細菌、出芽酵母または分裂酵母などの酵母、Ｓｆ９細胞などの昆虫細胞、非ヒト哺乳類細胞またはヒト細胞）における発現のために最適化されたコドンであってもよい。 In certain embodiments, the method further comprises providing a polynucleotide sequence for each variant of the GPCR. The polynucleotide sequence is codon-optimized for expression in a host (e.g., a bacteria such as E. coli, a yeast such as Saccharomyces cerevisiae or fission yeast, an insect cell such as Sf9 cells, a non-human mammalian cell, or a human cell). There may be.

特定の実施形態では、スクリプト化手順はＶＢＡスクリプトを含むことができる。特定の実施形態では、スクリプト化手順はＬｉｎｕｘ（登録商標）システム（例えば、Ubuntu 12.04 LTS）、Ｕｎｉｘ（登録商標）システム、Microsoft Windowsオペレーティングシステム、ＡｎｄｒｏｉｄオペレーティングシステムまたはApple iOSオペレーティングシステムにおいて動作可能である。本発明の実装と共に、Ｃ＋＋、ＪａｖａＳｃｒｉｐｔ（登録商標）、ＭＡＴＬＡＢなどを含む異なるプログラミング言語を使用することができる。コード化された命令は、当業者に知られているコンピュータシステムと共に使用することができる非一時的コンピュータ可読媒体などのメモリ装置に記憶することができる。 In certain embodiments, the scripted procedure may include a VBA script. In certain embodiments, the scripted procedure is operable on a Linux system (eg, Ubuntu 12.04 LTS), a Unix system, a Microsoft Windows operating system, an Android operating system, or an Apple iOS operating system. Different programming languages may be used with implementations of the present invention, including C++, JavaScript, MATLAB, and the like. The coded instructions may be stored in a memory device such as a non-transitory computer-readable medium that can be used with computer systems known to those skilled in the art.

特定の実施形態では、当該αヘリックスドメインは、Ｇタンパク質共役受容体（ＧＰＣＲ）である天然膜タンパク質内の７つの膜貫通αヘリックスドメインのうちの１つである。いくつかの実施形態では、当該ＧＰＣＲは、プリン受容体（Ｐ２Ｙ_１、Ｐ２Ｙ_２、Ｐ２Ｙ_４、Ｐ２Ｙ_６）、Ｍ_１およびＭ_３ムスカリン性アセチルコリン受容体、トロンビン受容体（プロテアーゼ活性化受容体（ＰＡＲ）－１、ＰＡＲ－２）、トロンボキサン（ＴＸＡ_２）、スフィンゴシン１－リン酸（Ｓ１Ｐ_２、Ｓ１Ｐ_３、Ｓ１Ｐ_４およびＳ１Ｐ_５）、リゾホスファチジン酸（ＬＰＡ_１、ＬＰＡ_２、ＬＰＡ_３）、アンジオテンシンＩＩ（ＡＴ_１）、セロトニン（５－ＨＴ_２ｃおよび５－ＨＴ_４）、ソマトスタチン（ｓｓｔ_５）、エンドセリン（ＥＴ_ＡおよびＥＴ_Ｂ）、コレシストキニン（ＣＣＫ_１）、Ｖ_１ａバソプレシン受容体、Ｄ_５ドーパミン受容体、ｆＭＬＰホルミルペプチド受容体、ＧＡＬ_２ガラニン受容体、ＥＰ_３プロスタノイド受容体、Ａ_１アデノシン受容体、α_１アドレナリン作動性受容体、ＢＢ_２ボンベシン受容体、Ｂ_２ブラジキニン受容体、カルシウム感知受容体、ケモカイン受容体、ＫＳＨＶ－ＯＲＦ７４ケモカイン受容体、ＮＫ_１タキキニン受容体、甲状腺刺激ホルモン（ＴＳＨ）受容体、プロテアーゼ活性化受容体、神経ペプチド受容体、アデノシンＡ２Ｂ受容体、Ｐ２Ｙプリン受容体、代謝性グルタミン酸受容体、ＧＲＫ５、ＧＰＣＲ－３０およびＣＸＣＲ４からなる群から選択される。 In certain embodiments, the alpha helical domain is one of seven transmembrane alpha helical domains within a natural membrane protein that is a G protein coupled receptor (GPCR). In some embodiments, the GPCRs include purinergic receptors (P2Y ₁ , P2Y ₂ , P2Y ₄ , P2Y ₆ ), M ₁ and M ₃ muscarinic acetylcholine receptors, thrombin receptor (protease activated receptor (PAR )-1, PAR-2), thromboxane (TXA ₂ ), sphingosine 1-phosphate (S1P ₂ , S1P ₃ , S1P ₄ and S1P ₅ ), lysophosphatidic acid (LPA ₁ , LPA ₂ , LPA ₃ ), angiotensin II (AT ₁ ), serotonin (5-HT _2c and 5-HT ₄ ), somatostatin (sst ₅ ), endothelin (ET _A and ET _B ), cholecystokinin (CCK ₁ ), V _1a vasopressin receptor, D ₅ Dopamine receptors, fMLP formyl peptide receptors, GAL ₂ galanin receptors, EP ₃ prostanoid receptors, A ₁ adenosine receptors, α ₁ adrenergic receptors, BB ₂ bombesin receptors, B ₂ bradykinin receptors, calcium Sensing receptor, chemokine receptor, KSHV-ORF74 chemokine receptor, NK ₁ tachykinin receptor, thyroid stimulating hormone (TSH) receptor, protease activated receptor, neuropeptide receptor, adenosine A2B receptor, P2Y purinergic receptor , a metabotropic glutamate receptor, GRK5, GPCR-30 and CXCR4.

他の実施形態では、当該天然膜タンパク質または膜タンパク質は必須膜タンパク質である。さらなる態様では、当該天然膜タンパク質は哺乳類のタンパク質である。本発明のタンパク質は好ましくはヒトのタンパク質である。特定の実施形態では、具体的なＧＰＣＲタンパク質（例えば、ＣＸＣＲ４）の言及は、非ヒト哺乳類のＧＰＣＲなどの哺乳類のＧＰＣＲまたはヒトＧＰＣＲを指す。 In other embodiments, the native membrane protein or membrane protein is an essential membrane protein. In further embodiments, the native membrane protein is a mammalian protein. The protein of the invention is preferably a human protein. In certain embodiments, reference to a specific GPCR protein (eg, CXCR4) refers to a mammalian GPCR, such as a non-human mammalian GPCR, or a human GPCR.

いくつかの実施形態では、当該αヘリックスドメインは、文献内のどこかに記載されているように、リガンド結合を改善また変更するために例えば細胞外または細胞内ループにおいて修飾されたＧタンパク質共役受容体（ＧＰＣＲ）変異体内の７つの膜貫通αヘリックスドメインのうちの１つである。本発明の目的のために、「天然」または「野生型」という言葉は、本明細書に記載されている方法に従って水可溶化する前のタンパク質（またはαヘリックスドメイン）を指すものとする。 In some embodiments, the α-helical domain is a G protein-coupled receptor modified, e.g., in an extracellular or intracellular loop, to improve or alter ligand binding, as described elsewhere in the literature. It is one of seven transmembrane α-helical domains within the body (GPCR) variant. For purposes of the present invention, the term "native" or "wild type" shall refer to the protein (or α-helical domain) prior to water solubilization according to the methods described herein.

特定の実施形態では、当該膜タンパク質は、４つの膜貫通αヘリックスを特徴とするテトラスパニン膜タンパク質であってもよい。およそ５４種のヒトのテトラスパニン膜タンパク質が見直され、かつ注釈付けされている。多くは、細胞発生、活性化、増殖および運動性の制御において重要な役割を担う細胞間シグナル伝達イベントを媒介することが知られている。例えば、ＣＤ８１受容体は、Ｃ型肝炎ウイルス侵入およびマラリア原虫感染のための受容体として重要な役割を担う。ＣＤ８１遺伝子は癌抑制遺伝子領域に局在しており、癌悪性腫瘍を媒介するための候補になり得る。ＣＤ１５１は、細胞運動性、浸潤および癌細胞の転移の増加に関与している。ＣＤ６３の発現は卵巣癌の侵襲性と相関している。テトラスパニン膜タンパク質の特徴は、第２すなわち大きな細胞外ループ内のシステイン－システイン－グリシンモチーフである。 In certain embodiments, the membrane protein may be a tetraspanin membrane protein characterized by four transmembrane alpha helices. Approximately 54 human tetraspanin membrane proteins have been reviewed and annotated. Many are known to mediate intercellular signaling events that play important roles in controlling cell development, activation, proliferation, and motility. For example, the CD81 receptor plays an important role as a receptor for hepatitis C virus entry and malaria parasite infection. The CD81 gene is located in the tumor suppressor gene region and may be a candidate for mediating cancer malignancy. CD151 is involved in increasing cell motility, invasion and metastasis of cancer cells. CD63 expression correlates with ovarian cancer invasiveness. A characteristic feature of tetraspanin membrane proteins is a cysteine-cysteine-glycine motif within the second or large extracellular loop.

本発明の別の態様は、（１）当該ＧＰＣＲの膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）内の複数の疎水性アミノ酸が置換されており、ここで、（ａ）前記疎水性アミノ酸は、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択され、（ｂ）前記ロイシン（Ｌ）はそれぞれ独立して、グルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、（ｃ）前記イソロイシン（Ｉ）および前記バリン（Ｖ）はそれぞれ独立して、トレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、かつ（ｄ）前記フェニルアラニンはそれぞれチロシン（Ｙ）で置換されることを特徴とし、その後に、（２）当該変異体の７つ全てのＴＭ領域によりαヘリックス二次構造が維持されており、かつ（３）予測される膜貫通領域が存在しないことを特徴とする、Ｇタンパク質共役受容体（ＧＰＣＲ）の水溶性変異体を提供する。 Another aspect of the invention provides that (1) a plurality of hydrophobic amino acids within the transmembrane (TM) domain α-helical segment (“TM region”) of the GPCR are substituted, wherein (a) the hydrophobic (b) said leucine (L) is each independently selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F); (N) or serine (S); (c) said isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S); and (d) each of the phenylalanines is replaced with a tyrosine (Y), after which (2) an α-helical secondary structure is maintained by all seven TM regions of the mutant, and (3) Provide a water-soluble variant of a G protein-coupled receptor (GPCR) characterized by the absence of a predicted transmembrane region.

特定の実施形態では、当該水溶性変異体は、配列番号４～１１、１３～２０、２２～２９、３１～３８、４０～４７、４９～５６および５８～６４からなる群から選択される１つ以上のアミノ酸配列を含む。これは、配列番号３、１２、２１、３０、３９、４８および５７からなる群から選択される１つ以上のアミノ酸配列をさらに含んでもよい。特定の実施形態では、当該水溶性変異体はＣＸＣＲ４リガンドに結合する。 In certain embodiments, the water-soluble variant is 1 selected from the group consisting of SEQ ID NOs: 4-11, 13-20, 22-29, 31-38, 40-47, 49-56 and 58-64. Contains one or more amino acid sequences. It may further comprise one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 3, 12, 21, 30, 39, 48 and 57. In certain embodiments, the water soluble variant binds CXCR4 ligand.

特定の実施形態では、当該水溶性変異体は、配列番号６９～７６、７８～８５、８７、８９～９６、９８～１０５、１０７～１１４および１１６～１２３からなる群から選択される１つ以上のアミノ酸配列を含む。これは、配列番号６８、７７、８６、８８、９７、１０６、１１５および１２４からなる群から選択される１つ以上のアミノ酸配列をさらに含んでもよい。特定の実施形態では、当該水溶性変異体はＣＸ３ＣＲ１リガンドに結合する。 In certain embodiments, the water-soluble variant is one or more selected from the group consisting of SEQ ID NOs: 69-76, 78-85, 87, 89-96, 98-105, 107-114, and 116-123. Contains the amino acid sequence of It may further comprise one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 68, 77, 86, 88, 97, 106, 115 and 124. In certain embodiments, the water soluble variant binds CX3CR1 ligand.

特定の実施形態では、当該水溶性変異体は、配列番号１２８～１３５、１３７～１４４、１４６～１５３、１５５～１６２、１６４～１７１、１７３および１７５～
１８２からなる群から選択される１つ以上のアミノ酸配列を含む。これは、配列番号１２７、１３６、１４５、１５４、１６３、１７２、１７４および１８３からなる群から選択される１つ以上のアミノ酸配列をさらに含んでもよい。特定の実施形態では、当該水溶性変異体はＣＣＲ３リガンドに結合する。 In certain embodiments, the water-soluble variants are SEQ ID NOs: 128-135, 137-144, 146-153, 155-162, 164-171, 173 and 175-
one or more amino acid sequences selected from the group consisting of 182. It may further comprise one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 127, 136, 145, 154, 163, 172, 174 and 183. In certain embodiments, the water soluble variant binds CCR3 ligand.

特定の実施形態では、当該水溶性変異体は、配列番号１８７～１９４、１９６～２０３、２０５～２０６、２０８、２１０～２１７、２１９～２２５、２２７～２３４からなる群から選択される１つ以上のアミノ酸配列を含む。これは、配列番号１８６、１９５、２０４、２０７、２０９、２１８、２２６および２３５からなる群から選択される１つ以上のアミノ酸配列をさらに含んでもよい。特定の実施形態では、当該水溶性変異体はＣＣＲ５リガンドに結合する。 In certain embodiments, the water-soluble variant is one or more selected from the group consisting of SEQ ID NOs: 187-194, 196-203, 205-206, 208, 210-217, 219-225, 227-234. Contains the amino acid sequence of It may further comprise one or more amino acid sequences selected from the group consisting of SEQ ID NO: 186, 195, 204, 207, 209, 218, 226 and 235. In certain embodiments, the water soluble variant binds CCR5 ligand.

特定の実施形態では、当該水溶性変異体は、配列番号２３６～２４３、２４５～２５２、２５４～２６１、２６３～２７０、２７２、２７４～２８１および２８３～２９０からなる群から選択される１つ以上のアミノ酸配列を含む。これは、配列番号２３５、２４４、２５３、２６２、２７１、２７３、２８２および２９１からなる群から選択される１つ以上のアミノ酸配列をさらに含んでもよい。特定の実施形態では、当該水溶性変異体はＣＸＣＲ３リガンドに結合する。 In certain embodiments, the water-soluble variant is one or more selected from the group consisting of SEQ ID NOs: 236-243, 245-252, 254-261, 263-270, 272, 274-281, and 283-290. Contains the amino acid sequence of It may further comprise one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 235, 244, 253, 262, 271, 273, 282 and 291. In certain embodiments, the water soluble variant binds CXCR3 ligand.

特定の実施形態では、当該水溶性変異体は、配列番号２、６７、１２６、１８５、３２７、２９３、２９５、２９７、２９９、３０１、３０３、３０５、３０７、３０９、３１１、３１３、３１５、３１７、３１９、３２１、３２３または３２５のうちのいずれか１つに記載されている１つ以上の膜貫通ドメインを含む。特定の実施形態では、当該変異体は水溶性であり、かつ相同な天然膜貫通タンパク質のリガンドに結合する。 In certain embodiments, the water-soluble variant is SEQ ID NO. , 319, 321, 323 or 325. In certain embodiments, the variant is water soluble and binds a homologous natural transmembrane protein ligand.

本発明の別の態様は、（ａ）タンパク質産生に適した条件下で増殖培地において細菌を培養する工程と、（ｂ）この細菌の溶解物を画分に分けて可溶性画分および不溶性ペレット画分を生成する工程と、（ｃ）当該タンパク質を可溶性画分から単離する工程とを含み、ここで、（１）当該タンパク質は請求項２９～４６のいずれか１項に記載のＧタンパク質共役受容体（ＧＰＣＲ）の変異体であり、かつ（２）当該タンパク質の収率は増殖培地の少なくとも２０ｍｇ／Ｌ（例えば、３０ｍｇ／Ｌ、４０ｍｇ／Ｌ、５０ｍｇ／Ｌまたはそれ以上）であることを特徴とする、細菌（例えば、大腸菌）においてタンパク質を産生する方法を提供する。 Another aspect of the invention comprises (a) culturing bacteria in a growth medium under conditions suitable for protein production; and (b) dividing the bacterial lysate into fractions, a soluble fraction and an insoluble pellet fraction. and (c) isolating the protein from the soluble fraction, wherein (1) the protein is a G protein-coupled receptor according to any one of claims 29-46. (2) the yield of the protein is at least 20 mg/L (e.g., 30 mg/L, 40 mg/L, 50 mg/L or more) of the growth medium. A method of producing a protein in bacteria (eg, E. coli) is provided.

特定の実施形態では、細菌は大腸菌ＢＬ２１であり、かつ増殖培地はＬＢ媒体である。特定の実施形態では、当該タンパク質は細菌内のプラスミドによってコードされる。特定の実施形態では、当該タンパク質の発現は誘導プロモーター、例えばＩＰＴＧによって誘導可能な誘導プロモーターなどの制御下にある。特定の実施形態では、この溶解物を超音波処理によって生成する。特定の実施形態では、この溶解物を１４，５００×ｇ以上で遠心分離して可溶性画分を生成する。 In certain embodiments, the bacteria is E. coli BL21 and the growth medium is LB media. In certain embodiments, the protein is encoded by a plasmid within the bacterium. In certain embodiments, expression of the protein is under the control of an inducible promoter, such as an inducible promoter inducible by IPTG. In certain embodiments, this lysate is produced by sonication. In certain embodiments, the lysate is centrifuged at 14,500 xg or higher to generate the soluble fraction.

本発明の別の態様は、それを必要とする対象における膜タンパク質の活性により媒介される障害または疾患の治療法であって、前記対象に有効量の本明細書に記載されている水溶性ポリペプチドを投与する工程を含む方法を提供する。 Another aspect of the invention is a method for treating a disorder or disease mediated by the activity of a membrane protein in a subject in need thereof, comprising administering to said subject an effective amount of a water-soluble polypeptide as described herein. A method is provided comprising administering a peptide.

特定の実施形態では、当該水溶性ポリペプチドは当該膜タンパク質のリガンド結合活性を保持する。本発明の水溶性ペプチドを投与することによって治療することができる障害および疾患の例としては、限定されるものではないが、癌（例えば、小細胞肺癌、黒色腫、トリプルネガティブ乳癌）、パーキンソン病、心血管疾患、高血圧症および気管支喘息が挙げられる。 In certain embodiments, the water-soluble polypeptide retains the ligand binding activity of the membrane protein. Examples of disorders and diseases that can be treated by administering the water-soluble peptides of the invention include, but are not limited to, cancer (e.g., small cell lung cancer, melanoma, triple negative breast cancer), Parkinson's disease , cardiovascular disease, hypertension and bronchial asthma.

本発明の別の態様は、治療的有効量の本発明の水溶性ポリペプチドおよび薬学的に許容される担体または希釈液を含む医薬組成物を提供する。 Another aspect of the invention provides a pharmaceutical composition comprising a therapeutically effective amount of a water-soluble polypeptide of the invention and a pharmaceutically acceptable carrier or diluent.

さらに別の態様では、本発明は、修飾されたαヘリックスドメインを含む主題の水溶性ペプチドが形質移入された細胞を提供する。特定の実施形態では、当該細胞は、動物細胞（例えば、ヒト、非ヒト哺乳類、昆虫、鳥類、魚、爬虫類、両生類またはその他の細胞）、酵母または細菌細胞である。 In yet another aspect, the invention provides cells transfected with a subject water-soluble peptide comprising a modified alpha-helical domain. In certain embodiments, the cell is an animal cell (eg, a human, non-human mammal, insect, avian, fish, reptile, amphibian, or other cell), yeast, or bacterial cell.

本発明は、本明細書に記載されている方法（またはその工程）のうちの１つ以上を含む、コンピュータシステム上で実行されるコンピュータ実装方法も含む。コンピュータシステムはコンピュータ実行可能命令が記憶された非一時的コンピュータ可読媒体を含み、コンピュータ実行可能命令は、コンピュータシステムによって実行されると、コンピュータシステムに本方法を実行させ、コンピュータ実行可能命令は、当該コンピュータシステムよって実行されると、当該コンピュータシステムに本明細書において想定される方法を実行させる。さらに、本明細書に記載されている配列データおよび定量的結果を記憶するための少なくとも１つメモリと、本明細書に記載されている方法を実行するように構成された、メモリに接続された少なくとも１つのプロセッサとを備えるコンピュータシステムが想定される。電子表示装置と共にグラフィカルユーザインタフェース（ＧＵＩ）などのユーザインタフェースを使用して、本明細書に記載されている計算方法を含む選択方法を制御するように動作する処理パラメータを選択することができる。 The invention also includes computer-implemented methods executed on a computer system that include one or more of the methods (or steps thereof) described herein. The computer system includes a non-transitory computer-readable medium having computer-executable instructions stored thereon, the computer-executable instructions, when executed by the computer system, cause the computer system to perform the method; When executed by a computer system, it causes the computer system to perform the methods contemplated herein. Further, at least one memory for storing sequence data and quantitative results described herein and a memory connected to the memory configured to perform the methods described herein. A computer system comprising at least one processor is envisioned. A user interface, such as a graphical user interface (GUI), in conjunction with an electronic display device may be used to select processing parameters that operate to control selection methods, including the calculation methods described herein.

本発明の別の態様は、本発明の方法のいずれかを実行するための一連の命令が記憶された非一時的コンピュータ可読媒体を提供する。 Another aspect of the invention provides a non-transitory computer readable medium having a set of instructions stored thereon for performing any of the methods of the invention.

本発明のさらなる態様は、本発明の方法のいずれかと同様にアミノ酸の置換を実行するように動作するデータプロセッサを備え、ランク付け関数によりタンパク質変異体をランク付けし、Ｇタンパク質共役受容体の水溶性変異体を選択するように動作するデータ処理システムを提供する。 A further aspect of the invention comprises a data processor operative to perform amino acid substitutions similar to any of the methods of the invention, ranking protein variants by a ranking function and A data processing system is provided that is operative to select sex mutants.

当然のことながら、本発明の一態様にのみ記載されているもの（例えば、スクリーニング方法）を含む本発明の全ての実施形態は、当業者によって容易に理解されるべきであるように、明示的に請求権を放棄するか、そうでなければ不適切なものでない限り、本発明の全ての態様（例えば、水溶性タンパク質または使用方法）に適用可能なものとして解釈されるべきであり、本発明の任意の１つ以上のさらなる実施形態と組み合わせ可能なものとして解釈されるべきである。 It will be appreciated that all embodiments of the invention, including those described in only one aspect of the invention (e.g., screening methods), may be explicitly described, as should be readily understood by those skilled in the art. Unless otherwise waived or otherwise inappropriate, the present invention shall be construed as applicable to all aspects of the invention (e.g., water-soluble proteins or methods of use); should be construed as combinable with any one or more further embodiments of.

本発明の上記および他の目的、特徴および利点は、異なる図面を通して同様の符号が同じ部分を指している添付の図面に示す本発明の代表的な実施形態の以下のより詳細な説明から明らかになるであろう。これらの図面は必ずしも縮尺どおりではなく、その代わり、本発明の原理を示すことに重点を置いている。 The above and other objects, features and advantages of the invention will become apparent from the following more detailed description of exemplary embodiments of the invention, which are illustrated in the accompanying drawings in which like reference numerals refer to the same parts throughout the different drawings. It will be. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.

図１Ａ～図１Ｄは、疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦをそれぞれＱ、Ｔ、Ｔ、Ｙに体系的に置換するＱＴＹコードの一般的な例示である。（図１Ａ）アミノ酸であるロイシンおよびグルタミンの分子形状は類似しており、同様に、イソロイシンおよびバリンの分子形状はトレオニンと類似しており、フェニルアラニンおよびチロシンの分子形状は類似している。ロイシン、イソロイシン、バリンおよびフェニルアラニンは疎水性であり、水分子と結合することができない。対照的に、グルタミンは４つの水分子、すなわち２つの水素ドナーおよび２つの水素アクセプターと結合することができ、トレオニンおよびチロシン上の－ＯＨ基は３つの水分子、すなわち１つの水素ドナーおよび２つのアクセプターに結合することができる。FIGS. 1A-1D are general illustrations of QTY codes that systematically substitute hydrophobic amino acids L, I, V, and F with Q, T, T, Y, respectively. (FIG. 1A) The amino acids leucine and glutamine have similar molecular shapes, similarly, isoleucine and valine have similar molecular shapes to threonine, and phenylalanine and tyrosine have similar molecular shapes. Leucine, isoleucine, valine and phenylalanine are hydrophobic and cannot bind water molecules. In contrast, glutamine can bind four water molecules, i.e. two hydrogen donors and two hydrogen acceptors, and the -OH groups on threonine and tyrosine can bind three water molecules, i.e. one hydrogen donor and two hydrogen acceptors. Can be bound to an acceptor. 図１Ｂはαヘリックスの側面図である。体系的なアミノ酸変化のＱＴＹコードを適用した後、αヘリックスは水溶性になる。FIG. 1B is a side view of the alpha helix. After applying the QTY code of systematic amino acid changes, the α-helix becomes water-soluble. 図１Ｃは、ＱＴＹコード置換の前および後のαヘリックスの上面図である。左側のヘリックスは、主として疎水性アミノ酸を有する天然膜ヘリックスであり、右側のヘリックスはＱＴＹコード置換を適用した後の同じヘリックスである。このヘリックスはこの時点で最も親水性のアミノ酸を有する。Figure 1C is a top view of the α-helix before and after QTY code substitution. The helix on the left is the native membrane helix with primarily hydrophobic amino acids, and the helix on the right is the same helix after applying QTY code substitutions. This helix has the most hydrophilic amino acids at this point. （図１Ｄ）ＱＴＹコード前は、当該ＧＰＣＲ膜タンパク質はそれらが脂質膜内に埋め込まれるように疎水性脂質分子に取り囲まれている（図１Ｄの左の部分）。ＱＴＹコードを適用した後、当該ＧＰＣＲ膜タンパク質は水溶性になり、安定化のためにそれを取り囲む界面活性剤をもはや必要としない（図１Ｄの右の部分）。(FIG. 1D) Before the QTY code, the GPCR membrane proteins are surrounded by hydrophobic lipid molecules such that they are embedded within a lipid membrane (left part of FIG. 1D). After applying the QTY code, the GPCR membrane protein becomes water-soluble and no longer requires surfactant surrounding it for stabilization (right part of Figure 1D). ＣＸＣＲ４の膜貫通ドメイン領域のＴＭＨＭＭ予測である。この予測は、識別可能な７つの疎水性膜貫通セグメントを示す。対照的に、本発明のＱＴＹ置換方法に供したＣＸＣＲ４の変異体（ＣＸＣＲ４－ＱＴＹ）のＴＭＨＭＭ予測では、もはや目に見える識別可能な７つの疎水性膜貫通セグメントは存在しない。TMHMM prediction of the transmembrane domain region of CXCR4. This prediction shows seven distinguishable hydrophobic transmembrane segments. In contrast, in the TMHMM prediction of the variant of CXCR4 (CXCR4-QTY) subjected to the QTY replacement method of the present invention, there are no longer any visible and distinguishable seven hydrophobic transmembrane segments. ＣＸＣＲ４の完全にＱＴＹコード修飾されたＴＭ１ドメインの予測されるαヘリックス車輪構造を示す。The predicted α-helical wheel structure of the fully QTY-coded modified TM1 domain of CXCR4 is shown. GPCR CXCR4の７つのＴＭ領域のそれぞれにおける当該変異体候補の例示である。Figure 2 is an illustration of candidate variants in each of the seven TM regions of GPCR CXCR4. 野生型タンパク質およびＣＸＣＲ４、ＣＸＣＲ３、ＣＣＲ３およびＣＣＲ５のそれぞれのＱＴＹ変異体の配列アラインメントである。ＱＴＹコードは、７つの疎水性膜貫通セグメントにのみ適用され、細胞外および細胞内セグメントには適用されない。Sequence alignment of the wild type protein and each QTY variant of CXCR4, CXCR3, CCR3 and CCR5. The QTY code applies only to the seven hydrophobic transmembrane segments and not to the extracellular and intracellular segments. 野生型タンパク質およびＣＸＣＲ４、ＣＸＣＲ３、ＣＣＲ３およびＣＣＲ５のそれぞれのＱＴＹ変異体の配列アラインメントである。ＱＴＹコードは、７つの疎水性膜貫通セグメントにのみ適用され、細胞外および細胞内セグメントには適用されない。Sequence alignment of the wild type protein and each QTY variant of CXCR4, CXCR3, CCR3 and CCR5. The QTY code applies only to the seven hydrophobic transmembrane segments and not to the extracellular and intracellular segments. 野生型タンパク質およびＣＸＣＲ４、ＣＸＣＲ３、ＣＣＲ３およびＣＣＲ５のそれぞれのＱＴＹ変異体の配列アラインメントである。ＱＴＹコードは、７つの疎水性膜貫通セグメントにのみ適用され、細胞外および細胞内セグメントには適用されない。Sequence alignment of the wild type protein and each QTY variant of CXCR4, CXCR3, CCR3 and CCR5. The QTY code applies only to the seven hydrophobic transmembrane segments and not to the extracellular and intracellular segments. 野生型タンパク質およびＣＸＣＲ４、ＣＸＣＲ３、ＣＣＲ３およびＣＣＲ５のそれぞれのＱＴＹ変異体の配列アラインメントである。ＱＴＹコードは、７つの疎水性膜貫通セグメントにのみ適用され、細胞外および細胞内セグメントには適用されない。Sequence alignment of the wild type protein and each QTY variant of CXCR4, CXCR3, CCR3 and CCR5. The QTY code applies only to the seven hydrophobic transmembrane segments and not to the extracellular and intracellular segments. 本方法の代表的な実施形態のフローチャートである。3 is a flowchart of an exemplary embodiment of the method. 本方法の代表的な実施形態の別のフローチャートである。2 is another flowchart of an exemplary embodiment of the method. 本発明のコンピュータシステムの例示である。1 is an illustration of a computer system of the present invention. 本発明の特定の好ましい実施形態の処理工程を記載しているフローチャートの概略図である。1 is a schematic diagram of a flowchart describing the process steps of certain preferred embodiments of the invention; FIG. 本発明の特定の好ましい実施形態の処理工程を記載しているフローチャートの概略図である。1 is a schematic diagram of a flowchart describing the process steps of certain preferred embodiments of the invention; FIG.

本発明の好ましい実施形態の説明は以下のとおりである。「１つ（種）（ａ）の」または「１つ（種）（ａｎ）の」という言葉は、特に定めがない限り１つ（種）以上を包含するものとする。 A description of preferred embodiments of the invention follows. The words "a species" or "an" are intended to include one or more species, unless otherwise specified.

いくつかの態様では、本発明は、天然タンパク質の７つの膜貫通αヘリックスの疎水性残基であるロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）を親水性残基であるグルタミン（Ｑ）、トレオニン（Ｔ）およびチロシン（Ｙ）に変えるためのＱＴＹ（グルタミン、トレオニンおよびチロシン）置換（すなわち「ＱＴＹコード」）方法（または「原理」）の使用に関する。特定の実施形態では、上記のように、アスパラギン（Ｎ）およびセリン（Ｓ）を、Ｆ以外のＬ、Ｉおよび／またはＶのための置換残基として使用することもできる。本発明は、水不溶性の天然膜タンパク質を、天然タンパク質の一部または実質的に全ての機能をなお維持する、より水溶性の対応物に変換することができる。 In some aspects, the present invention replaces the hydrophobic residues of the seven transmembrane alpha helices of natural proteins, leucine (L), isoleucine (I), valine (V), and phenylalanine (F) with hydrophilic residues. relates to the use of the QTY (glutamine, threonine and tyrosine) substitution (or "QTY code") method (or "principle") to convert glutamine (Q), threonine (T) and tyrosine (Y) to glutamine (Q), threonine (T) and tyrosine (Y). In certain embodiments, asparagine (N) and serine (S) can also be used as replacement residues for L, I and/or V other than F, as described above. The present invention can convert water-insoluble native membrane proteins into more water-soluble counterparts that still retain some or substantially all of the functions of the native protein.

本発明は水溶性ペプチドの設計方法を含む。第一にヒトＣＣＲ３、ＣＣＲ５、ＣＸＣＲ４およびＣＸ３ＣＲ１を具体的な実施例として、ＧＰＣＲタンパク質に関して本方法を説明する。但し、本発明の一般的な原理は、膜貫通（αヘリックス）領域を有する他のタンパク質にも当てはまる。 The present invention includes methods for designing water-soluble peptides. The method is first described with respect to GPCR proteins, using human CCR3, CCR5, CXCR4 and CX3CR1 as specific examples. However, the general principles of the invention also apply to other proteins that have transmembrane (α-helical) regions.

ＧＰＣＲは典型的に７つの膜貫通αヘリックス（７つのＴＭ）と、７つのＴＭ領域によって接続された８つのループ（８つのＮＴＭ）とを有する。これらの膜貫通セグメントをＴＭ１、ＴＭ２、ＴＭ３、ＴＭ４、ＴＭ５、ＴＭ６およびＴＭ７と称してもよい。８つの非膜貫通ループは、４つの細胞外ループＥＬ１、ＥＬ２、ＥＬ３およびＥＬ４と４つの細胞内ループＩＬ１、ＩＬ２、ＩＬ３およびＩＬ４、従って全部で８つのループ（１つのＴＭ領域にのみそれぞれ接続された、それぞれが自由端を有するＮ末端およびＣ末端ループを含む）に分けられる。このように、７ＴＭ－ＧＰＣＲタンパク質を膜貫通および非膜貫通特徴に基づいて１５個の断片に分けることができる。 GPCRs typically have seven transmembrane α-helices (seven TMs) and eight loops (eight NTMs) connected by seven TM regions. These transmembrane segments may be referred to as TM1, TM2, TM3, TM4, TM5, TM6 and TM7. The eight non-transmembrane loops include four extracellular loops EL1, EL2, EL3 and EL4 and four intracellular loops IL1, IL2, IL3 and IL4, thus a total of eight loops (each connected to only one TM region). It is also divided into N-terminal and C-terminal loops, each with a free end. Thus, the 7TM-GPCR protein can be divided into 15 fragments based on transmembrane and non-membrane characteristics.

本発明の一態様は、コンピュータプログラムにスクリプト化手順を実行させて膜タンパク質（例えば、Ｇタンパク質共役受容体（ＧＰＣＲ））の水溶性変異体を選択または調製するように動作させる方法であって、
（１）分析のために当該膜タンパク質（例えば、ＧＰＣＲ）の配列を入力する工程と、
（２）当該膜タンパク質（例えば、ＧＰＣＲ）の膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）内の複数の疎水性アミノ酸が置換されている当該膜タンパク質（例えば、ＧＰＣＲ）の変異体を得る工程であって、
（ａ）前記疎水性アミノ酸は、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択され、
（ｂ）前記ロイシン（Ｌ）はそれぞれ独立して、グルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、
（ｃ）前記イソロイシン（Ｉ）および前記バリン（Ｖ）はそれぞれ独立して、トレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、かつ
（ｄ）前記フェニルアラニンはそれぞれチロシン（Ｙ）で置換される
ことを特徴とする工程と、その後に
（３）当該変異体のためにαヘリックス二次構造結果を得て、当該変異体内のαヘリックス二次構造の維持を確認する工程と、
（４）当該変異体のために膜貫通領域結果を得て、当該変異体の水溶性を確認する工程と
を含み、それにより当該膜タンパク質（例えば、ＧＰＣＲ）の水溶性変異体を選択することを特徴とする方法を提供する。 One aspect of the invention is a method of causing a computer program to perform scripted procedures to select or prepare a water-soluble variant of a membrane protein (e.g., a G protein-coupled receptor (GPCR)), the method comprising:
(1) inputting the sequence of the membrane protein (for example, GPCR) for analysis;
(2) A variant of the membrane protein (e.g., GPCR) in which multiple hydrophobic amino acids in the transmembrane (TM) domain α-helical segment (“TM region”) of the membrane protein (e.g., GPCR) are substituted. A process of obtaining
(a) the hydrophobic amino acid is selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) the leucine (L) is each independently substituted with glutamine (Q), asparagine (N) or serine (S),
(c) said isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N), or serine (S), and (d) said phenylalanine is each substituted with tyrosine (Y). followed by (3) obtaining α-helical secondary structure results for the mutant to confirm maintenance of α-helical secondary structure within the mutant;
(4) obtaining transmembrane region results for the variant and confirming the water solubility of the variant, thereby selecting a water-soluble variant of the membrane protein (e.g., GPCR). Provided is a method characterized by:

本明細書で使用される「膜（貫通）タンパク質の水溶性変異体」または「水溶性膜（貫通）変異体」は同義で使用することができる。 As used herein, "water-soluble variant of a membrane (spanning) protein" or "water-soluble membrane (spanning) variant" can be used interchangeably.

本発明の工程を実行する正確な順序は変更可能であってもよい。例えば、特定の実施形態では、工程（４）の前に工程（３）を行う。特定の実施形態では、工程（４）と同時に工程（３）を行う。特定の実施形態では、工程（４）の後に工程（３）を行う。 The exact order in which the steps of the invention are performed may vary. For example, in certain embodiments, step (3) is performed before step (4). In certain embodiments, step (3) is performed simultaneously with step (4). In certain embodiments, step (4) is followed by step (3).

特定の実施形態では、複数の疎水性アミノ酸は、当該タンパク質の全てのＴＭ領域に位置する全ての疎水性アミノ酸候補Ｌ、Ｉ、ＶおよびＦからランダムに選択される。特定の実施形態では、複数の疎水性アミノ酸は、当該タンパク質の全てのＴＭ領域に位置する全ての疎水性アミノ酸候補Ｌ、Ｉ、ＶおよびＦの約５％、６％、７％、８％、９％、１０％、１１％、１２％、１３％、１４％、１５％、１６％、１７％、１８％、１９％、２０％、２１％、２２％、２３％、２４％、２５％、２６％、２７％、２８％、２９％、３０％、３１％、３２％、３３％、３４％、３５％、３６％、３７％、３８％、３９％、４０％、４１％、４２％、４３％、４４％、４５％、５０％、５５％、６０％、６５％、７０％、７５％、８０％、８５％、９０％、９１％、９２％、９３％、９４％、９５％、９６％、９７％、９８％、９９％または１００％である。特定の実施形態では、複数の疎水性アミノ酸は、当該タンパク質の全てのＴＭ領域に位置する全ての疎水性アミノ酸候補Ｌ、Ｉ、ＶおよびＦの少なくとも約１０％、１５％、２０％、２５％、３０％、３５％、４０％、４５％、５０％である。特定の実施形態では、複数の疎水性アミノ酸は、当該タンパク質の全てのＴＭ領域に位置する全ての疎水性アミノ酸候補Ｌ、Ｉ、ＶおよびＦの約９５％、９０％、８５％、８０％、７５％、７０％、６５％、６０％または５０％以下である。特定の実施形態では、ランダムに選択された疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦを全てのＴＭ領域におよそ均等に分布させてもよく、あるいは１、２、３、４、５または６つのＴＭ領域に優先的または排他的に分布させてもよい。 In certain embodiments, the plurality of hydrophobic amino acids are randomly selected from all hydrophobic amino acid candidates L, I, V and F located in all TM regions of the protein. In certain embodiments, the plurality of hydrophobic amino acids comprises about 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25% , 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42 %, 43%, 44%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. In certain embodiments, the plurality of hydrophobic amino acids comprises at least about 10%, 15%, 20%, 25% of all candidate hydrophobic amino acids L, I, V and F located in all TM regions of the protein. , 30%, 35%, 40%, 45%, 50%. In certain embodiments, the plurality of hydrophobic amino acids comprises about 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60% or 50% or less. In certain embodiments, randomly selected hydrophobic amino acids L, I, V, and F may be approximately evenly distributed across all TM regions, or alternatively 1, 2, 3, 4, 5, or 6 TM regions. It may be preferentially or exclusively distributed in a region.

特定の実施形態では、当該タンパク質の全てのＴＭ領域上の全ての疎水性アミノ酸候補Ｌ、Ｉ、ＶおよびＦは置換されている。例えば、全てのＬは独立してＱ（あるいはＳまたはＮ）で置換されており、かつ／または全てのＩおよびＶは独立してＴ（あるいはＳまたはＮ）で置換されており、かつ／または全てのＦは、Ｙで置換されている。特定の実施形態では、全てのＬはＱで置換されており、全てのIおよびＶはＴで置換されており、かつ全てのＦはＹで置換されている。 In certain embodiments, all candidate hydrophobic amino acids L, I, V and F on all TM regions of the protein are substituted. For example, every L is independently substituted with Q (or S or N), and/or every I and V is independently substituted with T (or S or N), and/or All F's are replaced with Y. In certain embodiments, all L's are replaced with Q, all I and V's are replaced with T, and all F's are replaced with Y.

特定の実施形態では、全てのＴＭ領域内の選択された疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦをランダムに置換する代わりに、全ての置換を最初にＴＭ領域のいずれか１つ（例えば、最もＮ末端またはＣ末端のＴＭ領域）に限定することができ、所望の置換変異体のみを変異体候補ライブラリーのメンバーとして選択する。置換されている位置（例えば、ＴＭ領域内の１０番目の残基に対して３番目の残基が置換されている）、または置換残基の同一性（例えば、ＩまたはＶ置換のためにＴに対してＳ）あるいはその両方により、当該ライブラリーの全てのメンバーは選択されたＴＭ領域内の置換において異なる。所望の置換変異体を、αヘリックス二次構造予測結果および／または膜貫通領域予測結果を考慮するスコア化システムなどの所定の判断基準に基づいて選択する。 In certain embodiments, instead of randomly substituting selected hydrophobic amino acids L, I, V, and F in all TM regions, all substitutions are made first in any one of the TM regions (e.g., most (N-terminal or C-terminal TM region), and only desired substitution mutants are selected as members of the mutant candidate library. the position being substituted (e.g., the 3rd residue is substituted for the 10th residue in the TM region), or the identity of the substituted residue (e.g., T for an I or V substitution). and/or S), all members of the library differ in substitutions within the selected TM region. Desired substitutional variants are selected based on predetermined criteria, such as a scoring system that takes into account α-helical secondary structure predictions and/or transmembrane region predictions.

当該タンパク質の１、２、３、４、５、６つのさらなるＴＭ領域または当該タンパク質の全ての残りのＴＭ領域のためにこの方法を繰り返すことができ、各繰り返しにより電子メモリまたはデータベースに記憶することができる変異体候補ライブラリーを作製する。同じライブラリー内において、全ての変異体は選択されたＴＭ領内での置換において異なるが（上記参照）、それ以外の点では残りのＴＭ領域および非ＴＭ領域内で同じである。 This method can be repeated for 1, 2, 3, 4, 5, 6 additional TM regions of the protein or all remaining TM regions of the protein, with each iteration storing in an electronic memory or database. Create a mutant candidate library that can Within the same library, all variants differ in substitutions within the selected TM region (see above) but are otherwise identical within the remaining TM and non-TM regions.

２つ以上のそのようなライブラリー内の配列を用いるドメイン交換すなわちドメインシャッフリングにより、２つ以上のＴＭ領域に疎水性アミノ酸Ｌ、Ｉ、Ｖ、Ｆ置換を有する組み合わせ変異体を産生する。各ライブラリー内のメンバーの数に応じて、組み合わせ変異体の可能な組み合わせの総数は、各ライブラリー内のほんの数個のメンバーにより数百万に近づくことができる。例えば、７つのＴＭ領域を有するＧＰＣＲでは、７個のライブラリーのそれぞれに８種のメンバーが存在すれば、当該ライブラリーに基づく組み合わせ変異体の総数は、８^７すなわち約２１０万になる。特定の実施形態では、組み合わせ変異体ライブラリーは、約５００万、４００万、３００万、２００万、１００万または５０万種未満のメンバーを含む。 Domain swapping or domain shuffling using sequences in two or more such libraries produces combinatorial variants with hydrophobic amino acid L, I, V, F substitutions in two or more TM regions. Depending on the number of members in each library, the total number of possible combinations of combinatorial variants can approach millions with only a few members in each library. For example, for a GPCR with seven TM regions, if there are eight members in each of the seven libraries, the total number of combined variants based on the libraries is ⁸⁷ or about 2.1 million. In certain embodiments, the combinatorial variant library contains less than about 5 million, 4 million, 3 million, 2 million, 1 million, or 500,000 members.

従って、特定の実施形態では、工程（２）において、当該タンパク質（例えば、ＧＰＣＲ）の１つの同じＴＭ領域内の前記複数の疎水性アミノ酸の１つのサブセットを置換して変異体候補ライブラリーの１種のメンバーを作製し、かつ前記複数の疎水性アミノ酸の１つ以上の異なるサブセットを置換して当該ライブラリーのさらなるメンバーを作製する。 Accordingly, in certain embodiments, in step (2), one subset of said plurality of hydrophobic amino acids in one and the same TM region of said protein (e.g., a GPCR) is substituted to Members of the species are created and one or more different subsets of the plurality of hydrophobic amino acids are substituted to create further members of the library.

特定の実施形態では、本方法は、前記ライブラリーの全てのメンバーを組み合わせスコアに基づいてランク付けする工程をさらに含み、組み合わせスコアは、αヘリックス二次構造予測結果および膜貫通領域予測結果の重み付けされた組み合わせである。 In certain embodiments, the method further comprises ranking all members of the library based on a combined score, the combined score being a weighting of the α-helical secondary structure prediction results and the transmembrane region prediction results. This is a combination of

当業者であれば分かるように、異なる配列を有するドメインは恐らく、αへリックス形成のための異なる水溶性および傾向を予測する。特定の予測される水溶性または水溶性範囲、αへリックス構造を形成する傾向または傾向範囲に対して「スコア」を割り当てる。このスコアは定性的（０、１）であってもよく、ここで０は、例えば許容されない予測される水溶性を有するドメインを表すことができ、１は、例えば許容される予測される水溶性を有するドメインを表すことができる。このスコアは例えば閾値に基づくことができる。あるいは、例えば水溶性の程度の増加を特徴として確立される１～１０の尺度で、このスコアを評価することができる。あるいは、このスコアはｍｇ／ｍＬの単位で予測される溶解性を記述するなど、定量的であってもよい。各ドメインに対してスコアを評価したら、それらスコアのうちの１つまたは好ましくは両方によってドメイン変異体を容易に比較（またはランク付け）して、どちらも水溶性であり、かつαへリックスを形成するドメイン変異体を選択することができる。従って、好ましい実施形態は、ランク付けデータを計算するために使用することができるランク付け関数を利用することができる。目下記載されているシステムに基づいて調製された水溶性タンパク質を分析して特性評価し、所与の生物学的機能を達成するのに有効ではないそれらの置換組み合わせを使用して計算モデルを制約して、それにより、より効率的な情報処理を可能にするような本システムへの入力を得ることができることにも留意されたい。 As one skilled in the art will appreciate, domains with different sequences likely predict different water solubility and propensity for α-helix formation. A "score" is assigned for a particular predicted water solubility or range of water solubility, tendency or range of propensity to form alpha helical structures. This score may be qualitative (0, 1), where 0 may represent a domain with e.g. unacceptable predicted water solubility and 1 may represent e.g. an acceptable predicted water solubility. can represent a domain with . This score can be based on a threshold value, for example. Alternatively, the score can be evaluated, for example, on a scale of 1 to 10 established as characterized by an increasing degree of water solubility. Alternatively, this score may be quantitative, such as describing predicted solubility in units of mg/mL. Once scores have been assessed for each domain, domain variants can be easily compared (or ranked) by one or preferably both of those scores, both being water soluble and forming an alpha helix. domain variants can be selected. Accordingly, preferred embodiments may utilize ranking functions that may be used to calculate ranking data. Analyze and characterize water-soluble proteins prepared based on the system currently described and constrain computational models using those substitution combinations that are not effective in achieving a given biological function. It should also be noted that inputs to the system can thus be obtained that allow for more efficient information processing.

例えば、本発明の方法を用いて、１つ以上の変異体を設計して生体外および／または生体内で産生することができ、多くの当該技術分野において承認されている方法のいずれかに基づいて当該変異体の１つ以上の生物学的機能を決定することができる。ＧＰＣＲのために、例えば当該変異体によるリガンド結合および／または下流シグナル伝達を野生型ＧＰＣＲのものと比較することができ、特定の変異体を産生するために使用されるＱＴＹ置換パターンを生物学的活性の増加、維持または低下に関連づけることができる。１つ以上の変異体に基づいて得られたそのような構造／機能的関係の情報を機械学習のために使用したり、本発明の計算モデルに対してさらなる制約を課して本発明の方法によって産生される変異体をより効率的にランク付けしたりすることができる。このように、公知の成功している変異体の置換パターンにより近く一致する置換パターンを有する新しい変異体候補は、公知の成功している変異体の置換パターンにあまり一致していない置換パターンまたは公知の成功していない変異体の置換パターンにより近く一致する置換パターンを有する別の変異体候補よりも高くランク付けすることができる。 For example, using the methods of the invention, one or more variants can be designed and produced in vitro and/or in vivo, based on any of a number of art-recognized methods. one or more biological functions of the variant can be determined. For GPCRs, for example, ligand binding and/or downstream signaling by the mutant in question can be compared to that of a wild-type GPCR, and the QTY substitution pattern used to produce a particular mutant can be compared to biological It can be associated with increasing, maintaining or decreasing activity. Such structural/functional relationship information obtained based on one or more variants can be used for machine learning or by imposing further constraints on the computational model of the invention to improve the method of the invention. It is possible to more efficiently rank the mutants produced by In this way, new mutant candidates with substitution patterns that more closely match the substitution patterns of known successful mutants will be considered candidates for new mutants with substitution patterns that more closely match the substitution patterns of known successful mutants or with substitution patterns that more closely match the substitution patterns of known successful mutants. can be ranked higher than another variant candidate whose substitution pattern more closely matches that of the unsuccessful variant.

ＴＭＨＭＭプログラムを、ソフトウェアモジュール／パッケージの独立型（例えば、Ｌｉｎｕｘ（登録商標）システム用）として実行した場合、膜貫通領域／タンパク質を形成する傾向を予測するために使用することができる０～１のスコアを生成する。本発明の方法における水溶性の定量的予測としてこのスコアを使用することができる。 When the TMHMM program is run as a standalone software module/package (e.g. for Linux systems), it can be used to predict the propensity to form transmembrane regions/proteins. Generate a score. This score can be used as a quantitative prediction of water solubility in the methods of the invention.

従って特定の実施形態では、ランク付け関数のαヘリックス二次構造成分は、予測されるαヘリックス二次構造を有しない場合の０．５または１、および予測されるαヘリックス二次構造を維持している場合の０などの定量的スコアであってもよい。特定の実施形態では、０は予測されるＴＭ領域がなく、かつ１は１つ以上のＴＭ領域を形成する最も強い傾向を有する０～１の数値を提供するＴＭＨＭＭ２．０などのＴＭ領域予測プログラムによって膜貫通領域結果を得ることができる。従って、組み合わせスコアが維持された二次構造ならびに予測される水溶性（ＴＭ領域を形成する傾向によって測定）の総合評価を表すように、この２つのスコアを直接に、あるいは重みと共に組み合わせることができる。例えば、０の組み合わせスコアは、当該変異体が予測されるＴＭ領域を有しないが予測されるαヘリックス二次構造を維持しており、従って所望の変異体であることを示す。一方、変異体は、（例えば、多くの疎水性残基の存在により）ＴＭ領域を形成する強い傾向を有し、より大きな組み合わせスコアを有する傾向があり、従って、このスコア化スキーム下では望ましくない。 Thus, in certain embodiments, the α-helical secondary structure component of the ranking function is 0.5 or 1 with no predicted α-helical secondary structure, and 0.5 or 1 with no predicted α-helical secondary structure. It may also be a quantitative score such as 0 when the In certain embodiments, a TM region prediction program, such as TMHMM2.0, provides a number between 0 and 1, where 0 is no predicted TM region and 1 is the strongest tendency to form one or more TM regions. Transmembrane region results can be obtained by: Therefore, the two scores can be combined directly or with weights such that the combined score represents an overall assessment of preserved secondary structure as well as predicted water solubility (as measured by the tendency to form TM regions). . For example, a combination score of 0 indicates that the variant does not have the predicted TM region but maintains the predicted α-helical secondary structure and is therefore a desired variant. On the other hand, mutants that have a strong tendency to form TM regions (e.g. due to the presence of many hydrophobic residues) tend to have larger combinatorial scores and are therefore less desirable under this scoring scheme. .

特定の実施形態では、本方法は、αヘリックス二次構造が破壊または分裂されていることを示すような傾向のあるαヘリックス二次構造予測結果を有する変異体を除去する工程を含む。特定の実施形態では、本方法は、ＴＭ領域を形成する強い傾向を示す傾向のある膜貫通領域予測結果を有する変異体を除去する工程を含む。従って、本システムは、さらなる選択処理により変異体を除外することができるＢＥＡＭｉｎｇモジュールを備えることができる。 In certain embodiments, the method includes removing mutants that have α-helical secondary structure predictions that tend to indicate that the α-helical secondary structure is disrupted or disrupted. In certain embodiments, the method includes removing variants with transmembrane region predictions that tend to exhibit a strong tendency to form TM regions. Therefore, the system can be equipped with a BEAMing module that allows variants to be excluded by further selection processing.

特定の実施形態では、５％、１０％、２０％、２５％、３０％、４０％、５０％、６０％、７０％、８０％、９０％または９５％の重みをαヘリックス二次構造予測結果に割り当て、かつ残りを膜貫通領域予測結果に割り当てる重み付けスキームを含むように、ランク付け関数を選択することができる。生物学的機能などの所望の特性に応じて、ユーザが重み付け特徴を手動で選択するか、あるいはソフトウェアが重み付け特徴を自動的に選択することができる。 In certain embodiments, a weight of 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% is used for α-helical secondary structure prediction. The ranking function can be selected to include a weighting scheme that assigns the results to the results and the remainder to the transmembrane region prediction results. Depending on the desired characteristic, such as biological function, the weighting features can be selected manually by the user, or the weighting features can be automatically selected by the software.

特定の実施形態では、本方法は、最も高い組み合わせスコアを有するＮ種のメンバーを選択して前記ＴＭ領域のための変異体候補の第１のライブラリーを形成する工程をさらに含み、ここで、Ｎは所定の整数（例えば、３、４、５、６、７、８、９、１０、１１、１２、１３、１４、１５、１６、１７、１８、１９、２０またはそれ以上）である。 In certain embodiments, the method further comprises selecting members of the N species with the highest combination scores to form a first library of variant candidates for the TM region, wherein: N is a predetermined integer (eg, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more).

特定の実施形態では、本方法は、当該タンパク質（例えば、ＧＰＣＲ）の１、２、３、４、５、６つまたは全ての残りのＴＭ領域のための変異体候補の１つのライブラリーを作製する工程をさらに含む。当該ライブラリー内の各エントリーは、１つ以上のランク付け関数によって生成されるランク付けデータを含む、エントリーの属性を定めるために使用されるフィールドを含むことができる。 In certain embodiments, the method involves generating a library of variant candidates for one, two, three, four, five, six or all remaining TM regions of the protein (e.g., GPCR). The method further includes the step of: Each entry in the library may include fields used to define attributes of the entry, including ranking data generated by one or more ranking functions.

特定の実施形態では、本方法は、当該タンパク質（例えば、ＧＰＣＲ）の２つ以上（例えば全て）のＴＭ領域を変異体候補ライブラリー内の対応するＴＭ領域で置き換えて組み合わせ変異体ライブラリーを作製する工程をさらに含む。本明細書で使用される「対応するＴＭ領域」とは、組み合わせられている当該タンパク質（例えば、ＧＰＣＲ）のＴＭ領域と同じか相同な変異体候補ライブラリー内のＴＭ領域を指す。例えば、ＧＰＣＲのＮ末端から２番目および３番目のＴＭ領域が置換される場合、２番目のＴＭ領域内にのみ置換を有するライブラリー内のＴＭ領域配列および３番目のＴＭ領域内にのみ置換を有するライブラリー内のＴＭ領域配列を、当該ＧＰＣＲの２番目および３番目のＴＭ領域にインポート／ペースト／転送して組み合わせ変異体を産生する。 In certain embodiments, the method involves replacing two or more (e.g., all) TM regions of the protein (e.g., GPCR) with corresponding TM regions in a candidate variant library to create a combinatorial variant library. The method further includes the step of: As used herein, "corresponding TM region" refers to a TM region within a candidate variant library that is the same or homologous to the TM region of the protein (eg, GPCR) with which it is being combined. For example, if the second and third TM regions from the N-terminus of a GPCR are substituted, TM region sequences in the library that have substitutions only in the second TM region and substitutions only in the third TM region The TM region sequences in the library containing the GPCR are imported/paste/transferred into the second and third TM regions of the GPCR to generate combinatorial variants.

特定の実施形態では、前記ロイシンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％はグルタミンで置換されている。特定の実施形態では、前記イソロイシンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％はトレオニンで置換されている。特定の実施形態では、前記バリンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％はトレオニンで置換されている。特定の実施形態では、前記フェニルアラニンの実質的に全て（例えば、９６％、９７％、９８％、９９％または１００％）あるいは３０％、４０％、５０％、６０％、７０％、７５％、８０％、８５％、９０％、９５％はチロシンで置換されている。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記ロイシンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記イソロイシンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記バリンは置換されていない。特定の実施形態では、１つ以上（例えば、１、２または３つ）の前記フェニルアラニンは置換されていない。 In certain embodiments, substantially all of the leucine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% are substituted with glutamine. In certain embodiments, substantially all of the isoleucine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% are substituted with threonine. In certain embodiments, substantially all of the valine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% are substituted with threonine. In certain embodiments, substantially all of the phenylalanine (e.g., 96%, 97%, 98%, 99% or 100%) or 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% are substituted with tyrosine. In certain embodiments, one or more (eg, 1, 2 or 3) of the leucines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the isoleucines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the valines are unsubstituted. In certain embodiments, one or more (eg, 1, 2 or 3) of the phenylalanines are unsubstituted.

特定の実施形態では、本方法は、前記組み合わせ変異体を産生／発現させる工程をさらに含む。特定の実施形態では、本方法は、前記組み合わせ変異体をリガンド結合について（例えば、生体外または酵母ツーハイブリッド法などの生物系において）試験する工程をさらに含み、ここでは、当該ＧＰＣＲのリガンド結合と比較して実質的に同じリガンド結合を有するものを選択する。特定の実施形態では、本方法は、前記組み合わせ変異体を当該ＧＰＣＲの生物学的機能について試験する工程をさらに含み、ここでは、当該ＧＰＣＲの生物学的機能と比較して実質的に同じ生物学的機能を有するものを選択する。 In certain embodiments, the method further comprises producing/expressing said combination variant. In certain embodiments, the method further comprises testing said combination variant for ligand binding (e.g., in vitro or in a biological system such as a yeast two-hybrid assay), wherein said combination variant is tested for ligand binding of said GPCR. Compare and select those with substantially the same ligand binding. In certain embodiments, the method further comprises testing the combination variant for a biological function of the GPCR, wherein the combination variant has substantially the same biological function as compared to the biological function of the GPCR. Choose one that has the desired function.

特定の実施形態では、ＴＭタンパク質（例えば、ＧＰＣＲ）の配列は、当該タンパク質のＴＭ領域に関する情報、例えば、全てのＴＭ領域の位置などの当該ＴＭタンパク質の１つ以上の膜貫通領域の位置を含む。そのような配列は、画定されたＴＭ領域により解明された結晶構造を有するタンパク質に属していてもよい。また、そのような配列は、先の研究に基づいて注釈付けされたＴＭ領域情報を有するタンパク質に属していてもよく、そのような情報は、ＰＤＢ、ＵｎｉＰｒｏｔ、ジェンバンク、ＥＭＢＬ、ＤＢＪなどの公開または独自のデータベースから容易に入手可能である In certain embodiments, the sequence of a TM protein (e.g., a GPCR) includes information about the TM regions of the protein, e.g., the location of one or more transmembrane regions of the TM protein, such as the location of all TM regions. . Such a sequence may belong to a protein that has a resolved crystal structure with a defined TM region. Such sequences may also belong to proteins with TM region information annotated based on previous studies, and such information can be found in public sources such as PDB, UniProt, Genbank, EMBL, DBJ, etc. or readily available from a proprietary database

タンパク質構造データバンク（ＰＤＢ）は、タンパク質や核酸などの大きな生体分子の３次元構造データのための毎週更新されるリポジトリである。典型的にＸ線結晶学またはＮＭＲ分光法によって得られ、かつ世界中の生物学者および生化学者によって提出されたデータは、そのメンバー組織（ＰＤＢｅ、ＰＤＢｊおよびＲＣＳＢ）のウェブサイトを介してインターネット上で自由にアクセス可能である。ＰＤＢは世界タンパク質構造データバンクすなわちｗｗＰＤＢによって監督されている。ＰＤＢは構造ゲノム学などの構造生物学の領域において重要なリソースであり、最も主要な科学雑誌およびいくつかの資金提供機関は、自身の構造データをＰＤＢに提出する科学者を現在求めている。 The Protein Structure Data Bank (PDB) is a weekly updated repository for three-dimensional structural data of large biomolecules such as proteins and nucleic acids. Data, typically obtained by X-ray crystallography or NMR spectroscopy, and submitted by biologists and biochemists around the world, are made available on the Internet via the websites of its member organizations (PDBe, PDBj and RCSB). Freely accessible. The PDB is supervised by the World Protein Structural Data Bank or wwPDB. The PDB is an important resource in areas of structural biology such as structural genomics, and most major scientific journals and several funding agencies are currently seeking scientists to submit their structural data to the PDB.

ＰＤＢの内容を一次データとして考えれば、そのデータを違うように分類する何百もの派生（すなわち二次）データベースが存在する。例えば、ＳＣＯＰおよびＣＡＴＨはどちらも構造の種類および想定される進化的関係に従って構造を分類しており、ＧＯは遺伝子に基づいて構造を分類しているが、結晶学的データベースは、タンパク質の３Ｄ構造に関する情報を記憶している。全てのそのような公的に入手可能なデータベースを使用して膜貫通領域の存在および位置に関する情報を含む入力配列情報を得てもよい。 If we consider the contents of the PDB as primary data, there are hundreds of derivative (ie, secondary) databases that categorize that data differently. For example, while SCOP and CATH both classify structures according to structure type and assumed evolutionary relationships, and GO classifies structures based on genes, crystallographic databases classify structures according to the 3D structure of proteins. remembers information about. All such publicly available databases may be used to obtain input sequence information, including information regarding the presence and location of transmembrane regions.

本発明の方法で使用される配列情報を提供することができる別の公的に利用可能なデータベースは、ＵｎｉＰｒｏｔである。ＵｎｉＰｒｏｔはタンパク質配列および機能情報の包括的な高品質かつ自由にアクセス可能なデータベースであり、多くのエントリーがゲノムシークエンシングプロジェクトに由来している。これは研究文献に由来するタンパク質の生物学的機能に関する多くの情報を含む。ＵｎｉＰｒｏｔは、４つのコアデータベースすなわちＵｎｉＰｒｏｔＫＢ（下位区分のＳｗｉｓｓ－ＰｒｏｔおよびＴｒＥＭＢＬを含む）、ＵｎｉＰａｒｃ、ＵｎｉＲｅｆおよびＵｎｉＭｅｓを提供する。これらのうち、ＵｎｉＰｒｏｔＫＢ／Ｓｗｉｓｓ－Ｐｒｏｔは、科学文献とバイオキュレーターによって評価されたコンピュータ分析とから抽出された情報が組み合わせられた、手動で注釈付けされた非冗長タンパク質配列データベースである。ＵｎｉＰｒｏｔＫＢ／Ｓｗｉｓｓ－Ｐｒｏｔの目的は、特定のタンパク質に関する全ての公知の関連情報を提供することである。注釈は、現在の科学的所見に遅れを取らないように定期的に見直されている。エントリーの手動の注釈はタンパク質配列および科学文献の詳細な分析を含む。同じ遺伝子および同じ生物種由来の配列は同じデータベースエントリーに併合されている。配列間の差を特定し、それらの原因を文書化する（例えば、選択的スプライシング、天然変異など）。コンピュータ予測を手動で評価し、エントリーに含めるために、関連する結果を選択する。これらの予測としては、翻訳後修飾、膜貫通ドメインおよびトポロジー、シグナルペプチド、ドメイン同定およびタンパク質ファミリー分類が挙げられ、全てを使用して本発明の方法で使用されるＴＭ領域に関連する有用な配列情報を得てもよい。 Another publicly available database that can provide sequence information used in the methods of the invention is UniProt. UniProt is a comprehensive, high-quality, freely accessible database of protein sequence and functional information, with many entries derived from genome sequencing projects. This includes a lot of information about the biological function of proteins derived from the research literature. UniProt provides four core databases: UniProtKB (including subdivisions Swiss-Prot and TrEMBL), UniParc, UniRef and UniMes. Among these, UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database that combines information extracted from the scientific literature and computer analysis evaluated by biocurators. The purpose of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. The annotations are regularly reviewed to keep abreast of current scientific findings. Manual annotation of entries includes detailed analysis of protein sequences and scientific literature. Sequences from the same gene and the same species are merged into the same database entry. Identify differences between sequences and document their causes (e.g., alternative splicing, natural mutations, etc.). Manually evaluate computer predictions and select relevant results for inclusion in the entry. These predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification and protein family classification, all of which can be used to generate useful sequences related to TM regions used in the methods of the invention. You can get information.

特定の実施形態では、当該ＴＭタンパク質（例えば、ＧＰＣＲ）の配列は、１つ以上（例えば任意）の膜貫通領域の位置に関する情報を含んでいない。但し、この１つ以上のＴＭ領域は公知のＴＭ領域を有する関連するタンパク質との配列相同性に基づいて予測することができる。例えば、関連するタンパク質は異なる生物種における相同なタンパク質であってもよい。 In certain embodiments, the sequence of the TM protein (eg, GPCR) does not include information regarding the location of one or more (eg, any) transmembrane regions. However, the one or more TM regions can be predicted based on sequence homology with related proteins having known TM regions. For example, related proteins may be homologous proteins in different species.

特定の実施形態では、当該ＴＭタンパク質（例えば、ＧＰＣＲ）の配列は１つ以上（例えば任意）の膜貫通領域の位置に関する情報を含んでおらず、そのような情報は公知の情報に基づいて容易に入手可能ではない。本実施形態では、本発明は、生物学的配列センター（Center for Biological Sequence Analysis）によって開発されたＴＭＨＭＭ２．０（隠れマルコフモデルを用いる膜貫通予測）プログラムなどの当該技術分野において承認されている方法を用いてＴＭ領域の計算を行う。以下のこれに関するさらなる詳細を参照されたい。 In certain embodiments, the sequence of the TM protein (e.g., GPCR) does not include information regarding the location of one or more (e.g., any) transmembrane regions, and such information is readily available based on known information. is not available. In this embodiment, the invention utilizes art-approved methods such as the TMHMM2.0 (Transmembrane Prediction Using Hidden Markov Models) program developed by the Center for Biological Sequence Analysis. The TM area is calculated using See further details regarding this below.

特定の実施形態では、本方法は、当該タンパク質（例えば、ＧＰＣＲ）の各変異体のポリヌクレオチド配列を提供する工程をさらに含む。そのようなポリヌクレオチド配列は、当該タンパク質（例えば、ＧＰＣＲ）のタンパク質配列および公知の遺伝暗号に基づいて容易に生成することができる。特定の実施形態では、当該ポリヌクレオチド配列は、宿主における発現のために最適化されたコドンである。当該宿主は、大腸菌などの細菌、出芽酵母または分裂酵母などの酵母、Ｓｆ９細胞などの昆虫細胞、非ヒト哺乳類細胞またはヒト細胞であってもよい In certain embodiments, the method further comprises providing a polynucleotide sequence for each variant of the protein (eg, GPCR). Such polynucleotide sequences can be easily generated based on the protein sequence of the protein in question (eg, GPCR) and the known genetic code. In certain embodiments, the polynucleotide sequence is codon optimized for expression in the host. The host may be a bacterium such as E. coli, a yeast such as Saccharomyces cerevisiae or fission yeast, an insect cell such as an Sf9 cell, a non-human mammalian cell or a human cell.

特定の実施形態では、当該タンパク質は、ＧＰＣＲ、例えば、プリン受容体（Ｐ２Ｙ_１、Ｐ２Ｙ_２、Ｐ２Ｙ_４、Ｐ２Ｙ_６）、Ｍ_１およびＭ_３ムスカリン性アセチルコリン受容体、トロンビン受容体（プロテアーゼ活性化受容体（ＰＡＲ）－１、ＰＡＲ－２）、トロンボキサン（ＴＸＡ_２）、スフィンゴシン１－リン酸（Ｓ１Ｐ_２、Ｓ１Ｐ_３、Ｓ１Ｐ_４およびＳ１Ｐ_５）、リゾホスファチジン酸（ＬＰＡ_１、ＬＰＡ_２、ＬＰＡ_３）、アンジオテンシンＩＩ（ＡＴ_１）、セロトニン（５－ＨＴ_２ｃおよび５－ＨＴ_４）、ソマトスタチン（ｓｓｔ_５）、エンドセリン（ＥＴ_ＡおよびＥＴ_Ｂ）、コレシストキニン（ＣＣＫ_１）、Ｖ_１ａバソプレシン受容体、Ｄ_５ドーパミン受容体、ｆＭＬＰホルミルペプチド受容体、ＧＡＬ_２ガラニン受容体、ＥＰ_３プロスタノイド受容体、Ａ_１アデノシン受容体、α_１アドレナリン作動性受容体、ＢＢ_２ボンベシン受容体、Ｂ_２ブラジキニン受容体、カルシウム感知受容体、ケモカイン受容体、ＫＳＨＶ－ＯＲＦ７４ケモカイン受容体、ＮＫ_１タキキニン受容体、甲状腺刺激ホルモン（ＴＳＨ）受容体、プロテアーゼ活性化受容体、神経ペプチド受容体、アデノシンＡ２Ｂ受容体、Ｐ２Ｙプリン受容体、代謝性グルタミン酸受容体、ＧＲＫ５、ＧＰＣＲ－３０およびＣＸＣＲ４からなる群から選択されるＧＰＣＲである。 In certain embodiments, the protein is a GPCR, such as the purinergic receptors (P2Y ₁ , P2Y ₂ , P2Y ₄ , P2Y ₆ ), M ₁ and M ₃ muscarinic acetylcholine receptors, thrombin receptor (protease activated receptor (PAR)-1, PAR-2), thromboxane (TXA ₂ ), sphingosine 1-phosphate (S1P ₂ , S1P ₃ , S1P ₄ and S1P ₅ ), lysophosphatidic acid (LPA ₁ , LPA ₂ , LPA ₃ ), angiotensin II (AT ₁ ), serotonin (5-HT _2c and 5-HT ₄ ), somatostatin (sst ₅ ), endothelin (ET _A and ET _B ), cholecystokinin (CCK ₁ ), V _1a vasopressin receptor , _D5 dopamine receptors, fMLP formyl peptide receptors, GAL ₂ galanin receptors, EP ₃ prostanoid receptors, A ₁ adenosine receptors, α ₁ adrenergic receptors, BB ₂ bombesin receptors, B ₂ bradykinin receptors body, calcium sensing receptor, chemokine receptor, KSHV-ORF74 chemokine receptor, NK ₁ tachykinin receptor, thyroid stimulating hormone (TSH) receptor, protease activated receptor, neuropeptide receptor, adenosine A2B receptor, P2Y A GPCR selected from the group consisting of purinergic receptors, metabotropic glutamate receptors, GRK5, GPCR-30 and CXCR4.

特定の実施形態では、本方法のスクリプト化手順はＶＢＡスクリプトを含む。 In certain embodiments, the scripting steps of the method include VBA scripts.

特定の実施形態では、当該スクリプト化手順は、Ｌｉｎｕｘ（登録商標）システム（例えば、Ubuntu 12.04 LTS）、Microsoft WindowsオペレーティングシステムまたはApple iOSオペレーティングシステムにおいて動作可能である。 In certain embodiments, the scripted procedure is operable on a Linux system (eg, Ubuntu 12.04 LTS), a Microsoft Windows operating system, or an Apple iOS operating system.

特定の実施形態では、本方法は、
（１）必要であれば当該タンパク質（例えば、ＧＰＣＲ）のαヘリックス構造を予測して、膜（貫通）タンパク質の第１の膜貫通領域を同定する工程、
（２）本明細書に定義されているＱＴＹコードにより複数の疎水性アミノ酸を修飾して、修飾された第１の膜貫通配列を得る工程、
（３）（２）の第１の修飾された膜貫通配列のαヘリックス構造の傾向を（例えば、第１の修飾された膜貫通配列を有する修飾された膜（貫通）タンパク質との関連において）スコア化して構造スコアを得る工程、
（４）（２）の第１の修飾された膜貫通配列の水溶性予測を（例えば、第１の修飾された膜貫通配列を有する修飾された膜（貫通）タンパク質との関連において）スコア化して溶解性スコアを得る工程、
（５）（２）～（４）を繰り返して推定上水溶性である第１の修飾された膜貫通変異体の第１のライブラリーを得る工程、
（６）第１のライブラリー内の推定上水溶性である第１の修飾された膜貫通変異体のそれぞれの構造スコアおよび溶解性スコアを比較し、好ましくは前記構造スコアおよび溶解性スコアを用いて推定上水溶性である第１の修飾された膜貫通変異体をランク付けする工程、
（７）複数の推定上水溶性である第１の修飾された膜貫通変異体（ここで、複数とは整数Ｈまたは好ましくは１０、９、８、７、６、５または４未満である）を選択して、推定上水溶性である第１の修飾された膜貫通変異体の第２のライブラリーを得る工程、
（８）当該タンパク質の第２、第３、第４、第５、第６、第７または好ましくは全ての膜貫通領域のために工程（１）～（７）を繰り返す工程（本方法によって修飾された膜貫通領域の合計は整数ｎである）、
（９）工程（１）～（８）において修飾されたいずれかの膜貫通領域に含まれておらず、かつ当該タンパク質の任意の細胞外または細胞内ドメインを含むタンパク質のアミノ酸配列を同定する工程、
（１０）推定上水溶性である修飾された膜貫通タンパク質の組み合わせ変異体を産生する工程（上記参照）、および
（１１）任意に、推定上水溶性である修飾された膜貫通変異体のそれぞれの核酸配列を同定する工程
のうちの全てまたは実質的に全てを含む。 In certain embodiments, the method comprises:
(1) If necessary, predicting the α-helical structure of the protein (e.g., GPCR) and identifying the first transmembrane region of the transmembrane (spanning) protein;
(2) modifying multiple hydrophobic amino acids with the QTY code defined herein to obtain a modified first transmembrane sequence;
(3) the tendency of the α-helical structure of the first modified transmembrane sequence in (2) (e.g., in the context of a modified membrane (spanning) protein with a first modified transmembrane sequence); scoring to obtain a structural score;
(4) scoring the water solubility prediction of the first modified transmembrane sequence of (2) (e.g., in the context of a modified transmembrane protein having the first modified transmembrane sequence); obtaining a solubility score;
(5) repeating (2) to (4) to obtain a first library of first modified transmembrane variants that are putatively water-soluble;
(6) comparing the structure and solubility scores of each of the first putatively water-soluble first modified transmembrane variants in the first library, preferably using said structure and solubility scores; ranking the first modified transmembrane variants that are putatively water soluble;
(7) a plurality of putatively water-soluble first modified transmembrane variants, where the plurality is the integer H or preferably less than 10, 9, 8, 7, 6, 5 or 4; selecting a second library of modified transmembrane variants of the first putatively water-soluble variants;
(8) repeating steps (1) to (7) for the second, third, fourth, fifth, sixth, seventh or preferably all transmembrane regions of the protein (modified by the present method); the total number of transmembrane regions is an integer n),
(9) Identifying the amino acid sequence of a protein that is not included in any of the transmembrane regions modified in steps (1) to (8) and that includes any extracellular or intracellular domain of the protein. ,
(10) producing combination variants of modified transmembrane proteins that are putatively water-soluble (see above); and (11) optionally, each of the modified transmembrane variants that are putatively water-soluble. all or substantially all of the steps of identifying a nucleic acid sequence.

上記方法において同定された核酸配列を用いて、推定上水溶性である修飾された膜貫通変異体のそれぞれおよび非膜貫通ドメイン（細胞外および細胞内ドメインを含む）のそれぞれの核酸配列を生成し、かつ組み合わせ的に発現させて、最大Ｈ^ｎ種の推定上水溶性である膜貫通タンパク質変異体のライブラリーを作製することができる。例えば、Ｈが８であり、かつｎが７である場合、およそ２百万種の水溶性タンパク質変異体のライブラリーを設計することができる。 The nucleic acid sequences identified in the above method are used to generate nucleic acid sequences for each of the putatively water-soluble modified transmembrane variants and for each of the non-transmembrane domains (including extracellular and intracellular domains). , and can be expressed combinatorially to generate a library of up to H ⁿ putatively water-soluble transmembrane protein variants. For example, if H is 8 and n is 7, a library of approximately 2 million water-soluble protein variants can be designed.

本発明の別の態様は、本発明の方法に基づいて設計された水溶性変異体タンパク質（例えば、ＧＰＣＲ）の発現に関する。本発明のこの態様は部分的に、本発明の方法に基づいて設計された水溶性変異体タンパク質（例えば、ＧＰＣＲ）により、生体外での無細胞発現系での発現および大腸菌などのよく使用される細胞による発現系での発現の両方における高レベルの発現を達成することができるという驚くべき発見に基づいている。また、発現されたタンパク質は高度に可溶性であり、大部分の膜タンパク質が典型的に内部に存在する不溶性の凝集物すなわちペレットとは対照的に、大腸菌培養物の溶解物の可溶性画分などの発現系の可溶性画分から容易に精製することができる。 Another aspect of the invention relates to the expression of water-soluble variant proteins (eg, GPCRs) designed based on the methods of the invention. This aspect of the invention is achieved, in part, by water-soluble mutant proteins (e.g., GPCRs) designed based on the methods of the invention, and expression in vitro and in commonly used cell-free expression systems such as E. coli. It is based on the surprising discovery that high levels of expression can be achieved both in cell-based expression systems and in cell-based expression systems. Also, the expressed protein is highly soluble, in contrast to the insoluble aggregates or pellets in which most membrane proteins typically reside, such as in the soluble fraction of a lysate of an E. coli culture. It can be easily purified from the soluble fraction of the expression system.

従って、本発明の一態様は、
（ａ）タンパク質産生に適した条件下で増殖培地において細菌を培養する工程と、
（ｂ）この細菌の溶解物を画分に分けて可溶性画分および不溶性ペレット画分を生成する工程と、
（ｃ）当該タンパク質を可溶性画分から単離する工程であって、
（１）当該タンパク質は本発明の主題の変異体タンパク質（例えば、Ｇタンパク質共役受容体（ＧＰＣＲ））であり、かつ
（２）当該タンパク質の収率は増殖培地の少なくとも２０ｍｇ／Ｌ（例えば、３０ｍｇ／Ｌ、４０ｍｇ／Ｌ、５０ｍｇ／Ｌまたはそれ以上）である
ことを特徴とする工程と
を含む、細菌（例えば大腸菌）においてタンパク質を産生する方法を提供する。 Therefore, one aspect of the present invention is
(a) culturing the bacteria in a growth medium under conditions suitable for protein production;
(b) fractionating the bacterial lysate to produce a soluble fraction and an insoluble pellet fraction;
(c) isolating the protein from the soluble fraction, comprising:
(1) the protein is a variant protein of the subject matter of the invention (e.g., a G protein-coupled receptor (GPCR)), and (2) the yield of the protein is at least 20 mg/L (e.g., 30 mg/L) of the growth medium. 40 mg/L, 40 mg/L, 50 mg/L or more).

特定の実施形態では、細菌は大腸菌ＢＬ２１であり、かつ増殖培地はＬＢ媒体である。特定の実施形態では、当該タンパク質は細菌内のプラスミドによってコードされる。特定の実施形態では、当該タンパク質の発現は誘導プロモーターの制御下にある。例えば、当該誘導プロモーターはＩＰＴＧによって誘導可能であってもよい。特定の実施形態では、当該溶解物を超音波処理によって生成する。特定の実施形態では、当該溶解物を１４，５００×ｇ以上で遠心分離して可溶性画分を生成する。 In certain embodiments, the bacteria is E. coli BL21 and the growth medium is LB media. In certain embodiments, the protein is encoded by a plasmid within the bacterium. In certain embodiments, expression of the protein is under the control of an inducible promoter. For example, the inducible promoter may be inducible by IPTG. In certain embodiments, the lysate is produced by sonication. In certain embodiments, the lysate is centrifuged at 14,500 xg or higher to generate the soluble fraction.

上記本発明の一般的な態様を用いて、本発明の特定の特徴または具体的な実施形態について以下にさらに説明する。 Using the general aspects of the invention described above, specific features or specific embodiments of the invention are further described below.

膜貫通領域予測
本発明の特定の方法は、ＧＰＣＲなどのタンパク質の膜貫通領域を予測する工程を含む。ＴＭ領域に関する当該技術分野で知られている多くのプログラムおよびソフトウェアがあり、そのうちのいずれかをＴＭ領域予測工程を必要とする本発明の方法において個々にまたは組み合わせて使用してもよい。これらのプログラムは通常、典型的に指定される形式（ＦＡＳＴＡまたはテキスト形式など）の入力配列を提供することをユーザに要求する非常に単純なユーザインタフェースを有し、かつテキストまたはグラフィックスあるいはその両方を用いて予測結果を与える。また、ユーザに特定のパラメータを指定させて予測結果を微調整するなどのより高度な機能を提供するプログラムもある。本発明の方法において全てのそのようなプログラムを使用することができる。 Transmembrane Region Prediction Certain methods of the invention include predicting transmembrane regions of proteins such as GPCRs. There are many programs and software known in the art relating to TM regions, any of which may be used individually or in combination in the methods of the present invention requiring a TM region prediction step. These programs typically have very simple user interfaces that typically require the user to provide an input sequence in a specified format (such as FASTA or text format), and may contain text and/or graphics. gives the prediction result using There are also programs that provide more advanced functionality, such as allowing users to specify specific parameters to fine-tune prediction results. All such programs can be used in the method of the invention.

例示的なＴＭ領域予測プログラムの１つは、ＴＭＨＭＭ（デンマーク工科大学の生物学的配列センターによって提供）であり、この方法は９７～９８％のＴＭ領域へリックスを正確に予測する。これは隠れマルコフモデルを用いてタンパク質内の膜貫通へリックスを予測する。入力されるタンパク質配列はＦＡＳＴＡ形式であってもよく、その出力をＴＭ領域の予測される位置の画像を含むｈｔｍｌページとして提示することができる。「Evaluation of Methods for the Prediction of Membrane Spanning Regions（膜貫通領域の予測方法の評価）」（Bioinformatics 17(7):646-653, 2001）という題名のＭｏｌｌｅｒらによる研究において、ＴＭＨＭＭは評価の時点で膜貫通予測プログラムを行う最良のプログラムであると判断された。 One exemplary TM region prediction program is TMHMM (provided by the Biological Sequence Center at the Technical University of Denmark), which accurately predicts 97-98% of TM region helices. It uses hidden Markov models to predict transmembrane helices within proteins. The input protein sequence may be in FASTA format and the output may be presented as an html page containing an image of the predicted location of the TM region. In a study by Moller et al. entitled “Evaluation of Methods for the Prediction of Membrane Spanning Regions” (Bioinformatics 17(7):646-653, 2001), TMHMM was It was judged to be the best program to perform a transmembrane prediction program.

その研究において比較されたプログラムとしては以下のプログラム：ＴＭＨＭＭ１．０、２．０および２．０の再トレーニング版(Sonnhammer et al., Int. Conf. Intell. Syst. Mol. Biol. AAAI Press, Montreal, Canada, pp.176-182, 1998; Krogh et al., J Mol Biol. 305(3):567-80, 2001)、ＭＥＭＳＡＴ１．５(Jones et al., Biochemistry 33:3038-3049, 1994)、Ｅｉｓｅｎｂｅｒｇ(Eisenberg et al., Nature 299:371-374, 1982)、Ｋｙｔｅ／Ｄｏｏｌｉｔｔｌｅ(Kyte and Doolittle, J. Mol. Biol. 157:105-132, 1982)、ＴＭＡＰ(Persson and Argos, J. Protein Chem. 16:453-457, 1997)、ＤＡＳ(Cserzo et al., Protein Eng. 10:673-676, 1997)、ＨＭＭＴＯＰ(Tusnady and Simon, J. Mol. Biol. 283:489-506, 1998)、ＳＯＳＵＩ(Hirokawa et al., Bioinformatics 14:378-379, 1998)、ＰＨＤ(Rost et al., Int. Conf. Intell. Syst. Mol. Biol. AAAI Press, St. Louis, USA, pp.192-200, 1996)、ＴＭｐｒｅｄ(Hofmann and Stoffel, Biol. Chem. Hoppe-Seyler 374:166, 1993)、ＫＫＤ(Klein et al., Biochim. Biophys. Acta. 815:468-476, 1985)、ＡＬＯＭ２(Nakai and Kanehisa, Genomics 14:489-911, 1992)およびＴｏｐｐｒｅｄ２(Claros and Heijne, Comput. Appl. Biosci. 10:685-686, 1994)が挙げられ、本発明の方法においてＴＭ領域を予測するために全てを使用することができる。引用されている全ての参考文献が参照により本明細書に組み込まれる。 The following programs were compared in the study: TMHMM 1.0, 2.0 and 2.0 retraining versions (Sonnhammer et al., Int. Conf. Intell. Syst. Mol. Biol. AAAI Press, Montreal , Canada, pp.176-182, 1998; Krogh et al., J Mol Biol. 305(3):567-80, 2001), MEMSAT1.5 (Jones et al., Biochemistry 33:3038-3049, 1994) , Eisenberg (Eisenberg et al., Nature 299:371-374, 1982), Kyte/Doolittle (Kyte and Doolittle, J. Mol. Biol. 157:105-132, 1982), TMAP (Persson and Argos, J. Protein Chem. 16:453-457, 1997), DAS (Cserzo et al., Protein Eng. 10:673-676, 1997), HMMTOP (Tusnady and Simon, J. Mol. Biol. 283:489-506, 1998) , SOSUI (Hirokawa et al., Bioinformatics 14:378-379, 1998), PHD (Rost et al., Int. Conf. Intel. Syst. Mol. Biol. AAAI Press, St. Louis, USA, pp.192- 200, 1996), TMpred (Hofmann and Stoffel, Biol. Chem. Hoppe-Seyler 374:166, 1993), KKD (Klein et al., Biochim. Biophys. Acta. 815:468-476, 1985), ALOM2 (Nakai and Kanehisa, Genomics 14:489-911, 1992) and Toppred2 (Claros and Heijne, Comput. Appl. Biosci. 10:685-686, 1994), all of which are used to predict the TM region in the method of the present invention. can be used. All cited references are incorporated herein by reference.

ＴＭＨＭＭの原理は、Krogh et al., Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes（隠れマルコフモデルを用いた膜貫通タンパク質のトポロジー：完全なゲノムへの適用）. Journal of Molecular Biology, 305(3):567-580, January 2001（参照により組み込まれる）およびSonnhammer et al., A hidden Markov model for predicting transmembrane helices in protein sequences（タンパク質配列内の膜貫通へリックスを予測するための隠れマルコフモデル） In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998, AAAI Press（参照により組み込まれる）に記載されている。 The principle of TMHMM is based on Krogh et al., Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. Journal of Molecular Biology, 305(3):567-580, January 2001 (incorporated by reference) and Sonnhammer et al., A hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park , CA, 1998, AAAI Press (incorporated by reference).

ＤＡＳ(Dense Alignment Surface, Cserzo et al., “Prediction of transmembrane alpha-helices in procariotic membrane proteins: the Dense Alignment Surface method（原核生物膜タンパク質における膜貫通αヘリックスの予測：高密度アラインメント表面法）,” Prot. Eng. 10(6): 673-676, 1997, Stockholm University, Sweden)は、高密度アラインメント表面法を用いて膜貫通領域を予測する。ＤＡＳは、以前に得られた特別なスコア行列を用いる、ライブラリー配列（非相同な膜タンパク質）セットに対するクエリー配列の低厳密性ドットプロットに基づいている。本方法は、そこから膜貫通セグメント候補の位置を得ることができる、当該クエリーの高精度疎水性プロファイルを提供する。ＤＡＳ－ＴＭｆｉｌｔｅｒアルゴリズムの新規性は、ＴＭライブラリーの配列内のＴＭセグメントを予測するための第２の予測サイクルである。ＤＡＳサーバを使用するために、ユーザはｗｗｗ．ｓｂｃ．ｓｕ．ｓｅ／～ｍｉｋｌｏｓ／ＤＡＳ／においてタンパク質配列を入力し、ＤＡサーバは入力配列のＴＭ領域を予測する。 DAS (Dense Alignment Surface, Cserzo et al., “Prediction of transmembrane alpha-helices in procariotic membrane proteins: the Dense Alignment Surface method,” Prot Eng. 10(6): 673-676, 1997, Stockholm University, Sweden) predict transmembrane regions using a dense alignment surface method. DAS is based on a low stringency dot plot of query sequences against a set of library sequences (heterologous membrane proteins) using a special scoring matrix obtained previously. The method provides a high precision hydrophobicity profile of the query from which the location of candidate transmembrane segments can be obtained. The novelty of the DAS-TMfilter algorithm is the second prediction cycle to predict TM segments within the sequences of the TM library. To use the DAS server, the user must visit www. sbc. su. Input a protein sequence at se/~miklos/DAS/, and the DA server predicts the TM region of the input sequence.

ＨＭＭＴＯＰ（ハンガリー科学アカデミー、ブダペスト）は、酵素学研究所(Institute of Enzymology)においてＧ．Ｅ．Ｔｕｓｎａｄｙによって開発された隠れマルコフモデルを用いて膜貫通へリックスおよびタンパク質のトポロジーを予測するための自動サーバである。この予測サーバによって使用される方法は、G.E Tusnady and I. Simon (1998) “Principles Governing Amino Acid Composition of Integral Membrane Proteins: Applications to Topology Prediction（内在性膜タンパク質のアミノ酸組成を決定する原理：トポロジー予測への適用）," J. Mol. Biol. 283: 489-506（参照により組み込まれる）に記載されている。ＨＭＭＴＯＰ２．０バージョンの新しい特徴は、G.E Tusnady and I. Simon (2001) “The HMMTOP transmembrane topology prediction server（ＨＭＭＴＯＰ膜貫通トポロジー予測サーバ）," Bioinformatics 17: 849-850（参照により組み込まれる）に記載されている。 HMMTOP (Hungarian Academy of Sciences, Budapest) is a G.I. E. An automated server for predicting transmembrane helices and protein topology using hidden Markov models developed by Tusnady. The method used by this prediction server is based on G.E Tusnady and I. Simon (1998) “Principles Governing Amino Acid Composition of Integral Membrane Proteins: Applications to Topology Prediction”. 283: 489-506 (incorporated by reference). New features of HMMTOP 2.0 version are described in G.E Tusnady and I. Simon (2001) “The HMMTOP transmembrane topology prediction server,” Bioinformatics 17: 849-850 (incorporated by reference). ing.

ＭＥＭＳＡＴ２膜貫通予測ページ（ｗｗｗ．ｓａｃｓ．ｕｃｓｆ．ｅｄｕ／ｃｇｉ－ｂｉｎ／ｍｅｍｓａｔ．ｐｙ）は、入力としてＦＡＳＴＡ形式またはテキスト形式を用いてタンパク質内の膜貫通セグメントを予測する。関連プログラムであるＭＥＭＳＡＴ（１．５）ソフトウェアは、David Jones博士(Jones et al., Biochemistry 33:3038-3049, 1994)によって著作権保護されている。ＭＥＭＳＴＡＴの最新版であるMEMSAT V3は、広く使用されている全ヘリックス膜タンパク質予測方法ＭＥＭＳＡＴである。この方法に対して、公知のトポロジーの膜貫通タンパク質の試験セットでベンチマークが行われた。ＭＥＭＳＡＴは配列データから全ヘリックス膜貫通タンパク質の構造および膜内のそれらの構成ヘリックス要素の位置の予測の際に７８％超の正確性を有するものと推定された。ＭＥＭＳＡＴ－ＳＶＭは、膜貫通ヘリックストポロジーの非常に正確な予測法である。これはシグナルペプチドを区別して細胞質および細胞外ループを同定することができる。ＭＥＭＳＡＴ３およびＭＥＭＳＡＴ－ＳＶＭはどちらも、いくつかの構造予測方法をロンドン大学において１箇所に集めるＰＳＩＰＲＥＤタンパク質配列分析ワークベンチの一部である。 The MEMSAT2 transmembrane prediction page (www.sacs.ucsf.edu/cgi-bin/memsat.py) predicts transmembrane segments within proteins using FASTA or text format as input. The associated program, MEMSAT (1.5) software, is copyrighted by Dr. David Jones (Jones et al., Biochemistry 33:3038-3049, 1994). MEMSAT V3, the latest version of MEMSAT, is the widely used whole-helical membrane protein prediction method MEMSAT. The method was benchmarked on a test set of transmembrane proteins of known topology. MEMSAT was estimated to have greater than 78% accuracy in predicting the structure of all-helical transmembrane proteins and the location of their constituent helical elements within the membrane from sequence data. MEMSAT-SVM is a highly accurate prediction method of transmembrane helix topology. It can distinguish between signal peptides and identify cytoplasmic and extracellular loops. Both MEMSAT3 and MEMSAT-SVM are part of the PSIPRED Protein Sequence Analysis Workbench, which brings together several structure prediction methods in one place at the University of London.

Ｐｈｏｂｉｕｓサーバ（ｐｈｏｂｉｕｓ．ｓｂｃ．ｓｕ．ｓｅ）は、ＦＡＳＴＡ形式のタンパク質のアミノ酸配列から膜貫通トポロジーおよびシグナルペプチドを予測するためのものである。Ｐｈｏｂｉｕｓについては、Lukas et al., “A Combined Transmembrane Topology and Signal Peptide Prediction Method（膜貫通トポロジーおよびシグナルペプチドを組み合わせた予測方法）,” Journal of Molecular Biology 338(5):1027-1036, 2004)に記載されている。ＰｏｙＰｈｏｂｉｕｓについては、Lukas et al., “An HMM posterior decoder for sequence feature prediction that includes homology information（相同性情報を含む配列特徴予測のためのＨＭＭ事後デコーダ）,” Bioinformatics, 21 (Suppl 1):i251-i257, 2005に記載されている。そして、Ｐｈｏｂｉｕｓウェブサーバについては、Lukas et al., “Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server（膜貫通トポロジーおよびシグナルペプチドを組み合わせた予測の利点：Ｐｈｏｂｉｕｓウェブサーバ）,” Nucleic Acids Res. 35:W429-32, 2007に記載されている（引用される技術内容は全て参照により組み込まれる）。 The Phobius server (phobius.sbc.su.se) is for predicting transmembrane topology and signal peptides from protein amino acid sequences in FASTA format. For Phobius, see Lukas et al., “A Combined Transmembrane Topology and Signal Peptide Prediction Method,” Journal of Molecular Biology 338(5):1027-1036, 2004). Are listed. For PoyPhobius, see Lukas et al., “An HMM posterior decoder for sequence feature prediction that includes homology information,” Bioinformatics, 21 (Suppl 1):i251- i257, 2005. And for the Phobius web server, see Lukas et al., “Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server,” Nucleic Acids Res. 35:W429-32, 2007 (all cited technical content is incorporated by reference).

ＳＯＳＵＩは、膜貫通へリックスの予測と共に、膜タンパク質および可溶性タンパク質を区別するためのものである。ＳＯＳＵＩは、トポロジーのための疎水性分析（Hydrophobicity Analysis for Topology）および三次構造のためのプローブヘリックス法（Probe Helix Method for Tertial Structure）を用いて膜貫通領域を予測する。タンパク質の分類の正確性は９９％程に高いと言われており、膜貫通ヘリックス予測のための対応する値は約９７％であると言われている。このＳＯＳＵＩシステムは、インターネットアクセスｗｗｗ．ｔｕａｔ．ａｃ．ｊｐ／ｍｉｔａｋｕ／ｓｏｓｕｉによって入手可能である。 SOSUI is for distinguishing between membrane proteins and soluble proteins, along with prediction of transmembrane helices. SOSUI predicts transmembrane regions using Hydrophobicity Analysis for Topology and Probe Helix Method for Tertial Structure. The accuracy of protein classification is said to be as high as 99%, and the corresponding value for transmembrane helix prediction is said to be about 97%. This SOSUI system provides Internet access www. tuat. ac. jp/mitaku/sosui.

ＴＭＰｒｅｄ（欧州分子生物学ネットワーク、スイスのノード）は、クエリー配列における膜貫通領域およびタンパク質の向きを予測する。具体的には、ＴＭＰｒｅｄアルゴリズムは、ＴＭｂａｓｅ（天然に生じる膜貫通タンパク質のデータベース）の統計分析に基づいている。その予測は、スコア化のためのいくつかの重み行列の組み合わせを用いてなされる。Hofmann & Stoffel (1993) “TMbase - A database of membrane spanning proteins segments（ＴＭｂａｓｅ：膜貫通タンパク質セグメントのデータベース）,” Biol. Chem. Hoppe-Seyler, 374:166を参照されたい。 TMPred (Node of the European Molecular Biology Network, Switzerland) predicts the orientation of transmembrane regions and proteins in a query sequence. Specifically, the TMPred algorithm is based on statistical analysis of TMbase, a database of naturally occurring transmembrane proteins. The prediction is made using a combination of several weight matrices for scoring. See Hofmann & Stoffel (1993) “TMbase - A database of membrane spanning protein segments,” Biol. Chem. Hoppe-Seyler, 374:166.

ＳＰＬＩＴ４．０サーバは、選好関数の方法を用いてＳＷＩＳＳ－ＰＲＯＴ形式で膜タンパク質の膜貫通（ＴＭ）二次構造を予測する膜タンパク質二次構造予測サーバ（ｓｐｌｉｔ．ｐｍｆｓｔ．ｈｒ／ｓｐｌｉｔ／４）である。Juretic et al., “Basic charge clusters and predictions of membrane protein topology（基本チャージクラスターおよび膜タンパク質トポロジーの予測）,” J. Chem. Inf. Comput. Sci., 42:620-632, 2002（参照により組み込まれる）を参照されたい。 The SPLIT4.0 server is a membrane protein secondary structure prediction server (split.pmfst.hr/split/4) that predicts the transmembrane (TM) secondary structure of membrane proteins in SWISS-PROT format using the preference function method. It is. Juretic et al., “Basic charge clusters and predictions of membrane protein topology,” J. Chem. Inf. Comput. Sci., 42:620-632, 2002 (incorporated by reference). Please refer to

ＰＲＥＤ－ＴＭＲは、単にタンパク質配列そのものを用いてタンパク質内の膜貫通ドメインを予測する。このアルゴリズムは、膜貫通領域の末端（「端部」、開始および終了）候補の検出により標準的な疎水性分析を正確にする。これにより、明確な開始および終了構成によって区切られていない高疎水性領域を廃棄し、かつそれらの疎水性組成によって区別可能でない推定上の膜貫通セグメントを確認することができる。信頼できるトポロジーを有する１０１種の非相同な膜貫通タンパク質の試験セットに関して得られた正確性は、他の一般的な既存の方法の正確性に十分に匹敵する。このアルゴリズムをＳｗｉｓｓＰｒｏｔデータベース（リリース３５）の全ての膜貫通タンパク質に適用した場合、予測正確性の僅かな低下のみが観察された。Pasquier et al., “A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm（ＳｗｉｓｓＰｒｏｔデータベースの統計分析に基づいてタンパク質内の膜貫通セグメントを予測するための新規な方法：ＰＲＥＤ－ＴＭＲアルゴリズム）,” Protein Eng., 12(5):381-385, 1999（参照により組み込まれる）を参照されたい。 PRED-TMR predicts transmembrane domains within a protein simply using the protein sequence itself. This algorithm refines standard hydrophobicity analysis by detecting candidate termini ("ends", start and end) of transmembrane regions. This allows us to discard regions of high hydrophobicity that are not delimited by clear start and end configurations and to identify putative transmembrane segments that are not distinguishable by their hydrophobic composition. The accuracy obtained for a test set of 101 heterologous transmembrane proteins with reliable topology compares well with the accuracy of other common existing methods. Only a slight decrease in prediction accuracy was observed when this algorithm was applied to all transmembrane proteins in the SwissProt database (Release 35). Pasquier et al., “A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Novel Method: PRED-TMR Algorithm),” Protein Eng., 12(5):381-385, 1999 (incorporated by reference).

関連するＰＲＥＤ－ＴＭＲ２では、その適用は、高い正確性で膜貫通タンパク質を可溶性または線維性タンパク質と区別することができる人工のニューラルネットワークによって代表される前処理段階により拡張されている。膜貫通タンパク質のいくつかの試験セットに適用した場合、このシステムは、膜貫通クラス内の全ての配列を分類することにより１００％の完璧な予測評価を与える。ＰＤＢｓｅｌｅｃｔデータベースから抽出された９９５種の非膜貫通タンパク質に適用した場合、ニューラルネットワークは、誤ってそれらのうちの２３種を膜貫通であると予測する（９７．７％の正確な割り当て）。Pasquier and Hamodrakas, “An hierarchical artificial neural network system for the classification of transmembrane proteins（膜貫通タンパク質の分類のための階層的な人工のニューラルネットワークシステム）,” Protein Eng., 12(8):631-634, 1999（参照により組み込まれる）を参照されたい。 In the related PRED-TMR2, its application is extended by a preprocessing step represented by an artificial neural network that can distinguish transmembrane proteins from soluble or fibrillar proteins with high accuracy. When applied to several test sets of transmembrane proteins, this system gives a 100% perfect prediction rating by classifying all sequences within the transmembrane class. When applied to 995 non-transmembrane proteins extracted from the PDBselect database, the neural network incorrectly predicts 23 of them to be transmembrane (97.7% correct assignment). Pasquier and Hamodrakas, “An hierarchical artificial neural network system for the classification of transmembrane proteins,” Protein Eng., 12(8):631-634, 1999 (incorporated by reference).

タンパク質αへリックス二次構造予測
本発明の特定の方法は、ＧＰＣＲなどのタンパク質のαへリックス二次構造を予測する工程を含む。多くのそのようなプログラムやソフトウェアが当該技術分野で知られており、それらのうちのいずれかをαへリックス二次構造予測工程を必要とする本発明の方法において個々にまたは組み合わせて使用してもよい。本発明の方法において全てのそのようなプログラムを使用することができる。 Prediction of Protein α-Helix Secondary Structure Certain methods of the invention include predicting the α-helix secondary structure of a protein, such as a GPCR. Many such programs and software are known in the art, any of which may be used individually or in combination in the methods of the present invention involving an α-helix secondary structure prediction step. Good too. All such programs can be used in the method of the invention.

二次構造予測の初期の方法は、３つの優勢である状態すなわちヘリックス、シートまたはランダムコイルの予測に制限されていた。これらの方法は、時として二次構造要素を形成する自由エネルギーを推定するための規則と結び付けられる個々のアミノ酸のヘリックスまたはシート形成傾向に基づいていた。そのような方法は典型的に、残基が取る３つの状態（ヘリックス／シート／コイル）のどれであるかを予測する際に約６０％正確であった。アミノ酸配列からタンパク質二次構造を予測するために広く使用されていた最初の技術は、Ｃｈｏｕ－Ｆａｓｍａｎ法であった。 Early methods of secondary structure prediction were limited to predicting three predominant states: helix, sheet, or random coil. These methods were based on the tendency of individual amino acids to form helices or sheets, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. Such methods were typically about 60% accurate in predicting which of the three states (helix/sheet/coil) a residue would assume. The first widely used technique to predict protein secondary structure from amino acid sequences was the Chou-Fasman method.

正確性の有意な増加（ほぼ約８０％への増加）は複数の配列アラインメントによって提供される情報を利用することによってなされ、進化を通じてある位置（およびその近く、典型的には片側に約７つの残基）で生じるアミノ酸の完全な分布を知ることにより、その位置の近くで構造的傾向の非常に良好な画像が得られる。例えば、所与のタンパク質は所与の位置にグリシンを有している場合があり、これはそれ自体がランダムコイルを示唆している場合がある。しかし、複数の配列アラインメントは、ヘリックスを好むアミノ酸が進化を通じて相同なタンパク質の９５％においてその位置（および近くの位置）に生じることを明らかにする場合がある。さらに、その位置および近くの位置における平均的な疎水性を調べることにより、同じアラインメントがαヘリックスに一致する残基溶媒露出度パターンも示唆する場合がある。まとめると、これらの因子は元のタンパク質のグリシンがランダムコイルではなくαヘリックス構造を取ることを示唆していると思われる。従って本発明の方法では、ニューラルネットワーク、隠れマルコフモデルおよびサポートベクターマシンを含むαへリックス二次構造予測プログラムは、全ての利用可能なデータを組み合わせて３つの状態の予測をなしてもよい。そのような予測方法は、全ての位置におけるそれらの予測のために信頼スコアも提供する。 Significant increases in accuracy (to almost 80%) are made by exploiting the information provided by multiple sequence alignments, and can be found throughout evolution at a given position (and its vicinity, typically about 7 on one side). Knowing the complete distribution of amino acids occurring at a residue) provides a very good picture of the structural trends near that position. For example, a given protein may have a glycine at a given position, which itself may suggest a random coil. However, multiple sequence alignments may reveal that helix-loving amino acids occur at that position (and nearby positions) in 95% of homologous proteins through evolution. Furthermore, by examining the average hydrophobicity at that position and nearby positions, the same alignment may also suggest a residue solvent exposure pattern consistent with an α-helix. Taken together, these factors seem to suggest that the glycine in the original protein adopts an α-helical structure rather than a random coil. Thus, in the method of the present invention, an α-helix secondary structure prediction program including a neural network, hidden Markov model and support vector machine may combine all available data to make a three-state prediction. Such prediction methods also provide confidence scores for their predictions at all locations.

二次構造予測方法に対して連続的にベンチマークが行われる（例えば、ＥＶＡ（ベンチマーク））。ＥＶＡは、タンパク質構造予測および二次構造予測方法の品質を評価するためのベンチマークプロジェクトを連続的に実行する。相同性モデリング、タンパク質スレッディングおよびコンタクトオーダ予測などの二次構造および三次構造の両方を予測する方法をタンパク質構造データバンク（ＰＤＢ）に寄託されている毎週新しく解明されたタンパク質構造からの結果と比較する。このプロジェクトは、一般的な公的に利用可能な予測ウェブサーバの専門家でないユーザのために期待される予測正確性を決定することを目的とする。 Secondary structure prediction methods are continuously benchmarked (e.g. EVA (benchmark)). EVA continuously runs benchmark projects to evaluate the quality of protein structure prediction and secondary structure prediction methods. Compare methods for predicting both secondary and tertiary structure, such as homology modeling, protein threading and contact order prediction, with results from weekly newly solved protein structures deposited in the Protein Structure Data Bank (PDB) . This project aims to determine the expected predictive accuracy for non-expert users of common publicly available predictive web servers.

これらの試験によれば、現時点で最も正確な方法は、Ｐｓｉｐｒｅｄ、ＳＡＭ(Karplus, "SAM-T08, HMM-based protein structure prediction（ＨＭＭに基づくタンパク質構造予測）," Nucleic Acids Res. (2009) 37 (Web Server issue): W492-497. doi:10.1093/nar/gkp403)、ＰＯＲＴＥＲ(Pollastri & McLysaght, "Porter: a new, accurate server for protein secondary structure prediction（タンパク質二次構造予測のための新しい正確なサーバ）," Bioinformatics 21 (8):1719-1720, 2005)、ＰＲＯＦ(Yachdav et al. (2014). "PredictProtein--an open resource for online prediction of protein structural and functional features（タンパク質の構造的および機能的特徴のオンライン予測のためのオープンリソース）," Nucleic Acids Res. 42 (Web Server issue): W337-343. doi:10.1093/nar/gku366)およびＳＡＢＬＥ(Adamczak et al. (2005) "Combining prediction of secondary structure and solvent accessibility in proteins（タンパク質における二次構造および溶媒露出度の組み合わせ予測）," Proteins 59 (3): 467-475. doi:10.1002/prot.20441)である。また、二次構造クラス（ヘリックス／ストランド／コイル）をＰＤＢ構造に割り当てるための標準的な方法は、ＤＳＳＰ(Kabsch W and Sander (1983) "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features（タンパク質二次構造の辞書：水素結合および幾何学的特徴のパターン認識）," Biopolymers 22 (12): 2577-2637. doi:10.1002/bip.360221211)であり、これらに対する予測に対してベンチマークが行われる。全てが参照により組み込まれ、本発明の方法において全てを使用することができる。 According to these tests, the most accurate method at present is Psipred, SAM (Karplus, "SAM-T08, HMM-based protein structure prediction," Nucleic Acids Res. (2009) 37 (Web Server issue): W492-497. doi:10.1093/nar/gkp403), PORTER (Pollastri & McLysaght, "Porter: a new, accurate server for protein secondary prediction structure" Server)," Bioinformatics 21 (8):1719-1720, 2005), PROF (Yachdav et al. (2014). Nucleic Acids Res. 42 (Web Server issue): W337-343. doi:10.1093/nar/gku366) and SABLE (Adamczak et al. (2005) "Combining prediction of "Combined prediction of secondary structure and solvent accessibility in proteins," Proteins 59 (3): 467-475. doi:10.1002/prot.20441). Additionally, the standard method for assigning a secondary structure class (helix/strand/coil) to a PDB structure is DSSP (Kabsch W and Sander (1983) "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen Bonds and Geometric Features)," Biopolymers 22 (12): 2577-2637. doi:10.1002/bip.360221211), and benchmark against predictions for these will be held. All are incorporated by reference and all may be used in the methods of the invention.

ＤＳＳＰアルゴリズムは、タンパク質の原子解像度座標を考慮して二次構造をタンパク質のアミノ酸に割り当てるための標準的な方法である。ＤＳＳＰは、単に静電気定義を用いてタンパク質の骨格内水素結合を同定することにより開始し、カルボニル酸素およびアミド水素のそれぞれに対して－０．４２ｅおよび＋０．２０ｅの部分電荷を仮定し、それらの逆をカルボニル炭素およびアミド窒素に割り当てる。以下の方程式のＥが－０．５ｋｃａｌ／ｍｏｌ未満であれば水素結合が特定される。

The DSSP algorithm is a standard method for assigning secondary structure to the amino acids of a protein considering the protein's atomic resolution coordinates. DSSP begins by simply identifying the intrabackbone hydrogen bonds of a protein using electrostatic definitions, assuming partial charges of -0.42e and +0.20e for the carbonyl oxygen and amide hydrogen, respectively, and their Assign the inverse to the carbonyl carbon and amide nitrogen. A hydrogen bond is identified if E in the equation below is less than -0.5 kcal/mol.

これに基づき、８種類の二次構造が割り当てられる。３_１０ヘリックス、αヘリックスおよびπヘリックスは符号Ｇ、ＨおよびＩを有し、それらの残基がそれぞれ３、４または５つの残基だけ離れた水素結合の反復配列を有することによって認識される。２種類のβシート構造が存在し、βブリッジは符号Ｂを有し、水素結合およびβバルジのより長いセットは符号Ｅを有する。Ｔはへリックスに典型的な水素結合を特徴とする回転のために使用され、Ｓは高い曲率の領域のために使用されており
（ここで、

と

との角度は７０°未満である）、空白（すなわちスペース）は他の規則が適用されない場合に使用され、ループを指す。これらの８種類は通常、３つのより大きなクラスすなわちヘリックス（Ｇ、ＨおよびＩ）、ストランド（ＥおよびＢ）ならびにループ（それ以外の全て）にグループ化される。 Based on this, eight types of secondary structures are assigned. 3 ₁₀ helices, α helices and π helices have the symbols G, H and I and are recognized by their residues having repeating sequences of hydrogen bonds separated by 3, 4 or 5 residues, respectively. There are two types of β-sheet structures, β-bridges have the symbol B and longer sets of hydrogen bonds and β-bulges have the symbol E. T is used for rotations characterized by hydrogen bonds typical of helices, and S is used for regions of high curvature (where

and

is less than 70°), white space (i.e., a space) is used when no other rules apply, and refers to a loop. These eight types are usually grouped into three larger classes: helices (G, H and I), strands (E and B) and loops (all else).

ＰＳＩＰＲＥＤ（ＰＳＩ－ＢＬＡＳＴによる二次構造予測）はタンパク質構造を調査するために使用される技術である。これは、そのアルゴリズムにおいてニューラルネットワークすなわち機械学習法を用いる。これは、フロントエンドインタフェースとしてサービスを提供するウェブサイトを特徴とするサーバ側プログラムであり、一次配列からタンパク質の二次構造（βシート、αへリックスおよびコイル）を予測することができる。ｂｉｏｉｎｆ．ｃｓ．ｕｃｌ．ａｃ．ｕｋ／ｐｓｉｐｒｅｄを参照されたい。この方法の考えは、進化的に関連するタンパク質の情報を使用して新しいアミノ酸配列の二次構造を予測する機械学習法である。具体的には、ＰＳＩ－ＢＬＡＳＴを使用して関連する配列を見つけ、位置特異的スコア行列を構築する。入力配列の二次構造を予測するように構築および訓練されたニューラルネットワークによりこの行列を処理する。この予測方法またはアルゴリズムを３つの段階、すなわち配列プロファイルの作成、最初の二次構造の予測および予測された構造のフィルタリングに分ける。ＰＳＩＰＲＥＤは、ＰＳＩ－ＢＬＡＳＴによって作成された配列プロファイルを正規化するように動作する。次いで、ニューラルネットワーキングを用いることにより、最初の二次構造を予測する。当該配列内の各アミノ酸のために、ニューラルネットワークに１５種の酸のウィンドウを与える。これらのウィンドウが当該鎖のＮまたはＣ末端を跨ぐか否かを示すさらなる情報が添付されている。これにより、２１個の単位からなる１５個のグループに分けられた３１５個の入力単位からなる最終的な入力層が得られる。このネットワークは７５個の単位からなる単一の隠れ層および３個の出力ノード（各二次構造要素すなわちヘリックス、シート、コイルに対して１つ）を有する。第１のネットワークの予測される構造をフィルタリングするために第２のニューラルネットワークを使用する。このネットワークに１５個の位置のウィンドウを与える。鎖末端におけるこのウィンドウの可能な位置に関する指標も転送する。これにより、４個からなる１５個のグループに分けられた６０個の入力単位が得られる。このネットワークは６０個の単位からなる単一の隠れ層および３個の出力ノード（各二次構造要素すなわちヘリックス、シート、コイルに対して１つ）における結果を有する。３個の最終的な出力ノードは、このウィンドウの中心位置について各二次構造要素のスコアを送る。最も高いスコアを有する二次構造を用いて、ＰＳＩＰＲＥＤは当該タンパク質予測を生成する。Ｑ３値は、二次構造状態すなわちヘリックス、ストランドおよびコイルにおいて正確に予測された残基の割合である。 PSIPRED (PSI-BLAST Secondary Structure Prediction) is a technique used to investigate protein structure. It uses neural networks or machine learning methods in its algorithms. This is a server-side program featuring a website that serves as a front-end interface, allowing prediction of protein secondary structure (β-sheets, α-helices and coils) from the primary sequence. bioinf. cs. ucl. ac. See uk/psipred. The idea of this method is a machine learning method that uses information from evolutionarily related proteins to predict the secondary structure of new amino acid sequences. Specifically, PSI-BLAST is used to find related sequences and build a position-specific scoring matrix. This matrix is processed by a neural network built and trained to predict the secondary structure of the input array. This prediction method or algorithm is divided into three stages: sequence profile creation, initial secondary structure prediction and filtering of the predicted structure. PSIPRED operates to normalize sequence profiles created by PSI-BLAST. The initial secondary structure is then predicted by using neural networking. For each amino acid in the sequence, give the neural network a window of 15 acids. Additional information is attached indicating whether these windows span the N or C terminus of the strand. This results in a final input layer consisting of 315 input units divided into 15 groups of 21 units. This network has a single hidden layer of 75 units and three output nodes (one for each secondary structure element: helix, sheet, coil). A second neural network is used to filter the predicted structure of the first network. Give this network a window of 15 locations. An indication of the possible position of this window at the chain end is also transferred. This results in 60 input units divided into 15 groups of 4. This network has a single hidden layer of 60 units and results at three output nodes (one for each secondary structure element: helix, sheet, coil). Three final output nodes send the score of each secondary structure element for the center position of this window. Using the secondary structure with the highest score, PSIPRED generates the protein prediction. The Q3 value is the percentage of residues correctly predicted in the secondary structural states ie helix, strand and coil.

例示的な実施形態の段階的説明
上に概略が記載されている本発明を用いて、特定の非限定的であるが例示的な実施形態について、図の中の代表的なフローチャートを参照しながら以下に説明する。 Step-by-Step Description of Exemplary Embodiments Using the invention as outlined above, certain non-limiting exemplary embodiments will now be described with reference to the representative flowcharts in the figures. This will be explained below.

図９Ａは非限定的な本発明の一実施形態を示す。この図は全体として本発明の方法２００を示し、ここでは当該タンパク質（例えば、ＧＰＣＲ）のＴＭ領域内の選択された疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦを、任意の特定のＴＭ領域／ドメインにおける置換を限定することなく本発明の「ＱＴＹコード」に従って置換する。 FIG. 9A depicts one non-limiting embodiment of the invention. This figure generally depicts a method 200 of the present invention, in which selected hydrophobic amino acids L, I, V and F within the TM region of a protein of interest (e.g., a GPCR) are added to any particular TM region/domain Substitutions are made according to the "QTY code" of the present invention without limitation.

その具体的な実施形態では、本方法は、膜貫通タンパク質であってもそうでなくてもよいタンパク質配列の入力を取得または読み取ること（２０４）によって開始する（２０２）。次いで、このタンパク質配列を、ＴＭ領域予測（２０６）（入力されたタンパク質配列からそのような情報がまだ利用可能でない場合）および当該技術分野において承認されている方法のいずれかに基づくαヘリックス二次構造予測に供すことができる。例えば、ＴＭＨＭＭプログラムなどのプログラムを用いてＴＭ領域予測を行うことができる（２４０）。この予測により２４２においてどんなＴＭ領域も得られない場合、ＳＯＳＵＩなどの１つ以上の異なるＴＭ領域予測プログラムを使用して（２５０）ＴＭ領域の存在／不存在の予測を可能にしてもよい。２５２においてそのようなプログラムに基づいてＴＭ領域が予測されない場合、当該タンパク質内にＴＭ領域が存在しない可能性があり（２５４）、本方法は終了する（２６０）。 In a specific embodiment thereof, the method begins (202) by obtaining or reading (204) input of a protein sequence, which may or may not be a transmembrane protein. This protein sequence is then combined with TM region prediction (206) (if such information is not already available from the input protein sequence) and α-helical secondary prediction based on any of the art-recognized methods. It can be used for structural prediction. For example, TM region prediction can be performed using a program such as a TMHMM program (240). If this prediction does not result in any TM region at 242, one or more different TM region prediction programs, such as SOSUI, may be used (250) to enable prediction of the presence/absence of a TM region. If no TM region is predicted at 252 based on such a program, there may be no TM region in the protein (254) and the method ends (260).

他方、２４２において好適なプログラムのいずれかにより１つ以上のＴＭ領域が予測される場合、ＴＭ領域タンパク質配列が得られ（２４４）、そのような１つ以上のＴＭ領域内で本発明のＱＴＹコードを疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦに適用することができる。より具体的には、ＱＴＹコードに従って、ＴＭ領域内のロイシンをそれぞれ独立してグルタミン（Ｑ）、セリン（Ｓ）またはアスパラギン（Ｎ）で置換するか（２１２）置換しないままにすることができ、ＴＭ領域内のイソロイシンおよびバリンをそれぞれ独立してトレオニン（Ｔ）、セリン（Ｓ）またはアスパラギン（Ｎ）で置換するか置換しないままにすることができ、かつＴＭ領域内のフェニルアラニンをそれぞれチロシン（Ｙ）で置換するか置換しないままにすることができる。そのようなＱＴＹ置換の結果から元の膜貫通タンパク質の１つ以上の推定上水溶性である変異体を産生する。なお、ある領域内の各アミノ酸に対してなされる置換の数はパラメータとして選択することができる。 On the other hand, if one or more TM regions are predicted by any of the suitable programs at 242, a TM region protein sequence is obtained (244) and within such one or more TM regions the QTY code of the invention is can be applied to the hydrophobic amino acids L, I, V and F. More specifically, according to the QTY code, each leucine in the TM region can be independently substituted with glutamine (Q), serine (S) or asparagine (N) (212) or left unsubstituted; Isoleucine and valine in the TM region can each be independently substituted with threonine (T), serine (S) or asparagine (N) or left unsubstituted, and each phenylalanine in the TM region can be replaced with tyrosine (Y). ) or leave it unreplaced. The result of such QTY substitutions is to produce one or more putatively water-soluble variants of the original transmembrane protein. Note that the number of substitutions made for each amino acid within a certain region can be selected as a parameter.

次に、ＰＯＲＴＥＲなどの当該技術分野において承認されている任意のプログラムを用いて、推定上水溶性である変異体のそれぞれにおけるαヘリックス二次構造を予測することができる（２１０）。その結果を、好ましくは同じプログラム（例えばＰＯＲＴＥＲ）を用いて予測される元のタンパク質の結果と比較することができる（２０８）。なお、元のタンパク質のＴＭ領域予測工程の前、それと同時またはその後に、当該技術分野において承認されている任意のプログラムを用いて元のタンパク質のαヘリックス二次構造を予測することができる。 The α-helical secondary structure in each of the putative water-soluble variants can then be predicted using any art-recognized program such as PORTER (210). The results can be compared to the original protein results, preferably predicted using the same program (eg PORTER) (208). Note that, before, simultaneously with, or after the step of predicting the TM region of the original protein, the α-helical secondary structure of the original protein can be predicted using any program approved in the technical field.

αヘリックス二次構造予測結果が、２１４において水溶性変異体候補が元のタンパク質と同じαヘリックス二次構造を維持しているか大部分を維持していることを示す場合、それは、その変異体におけるＱＴＹ置換の特定のパターンは元のタンパク質におけるαヘリックス二次構造に影響を与えないか有意に影響を与えないことを示唆している。次いで、このＴＭ領域の予測を行い（２２０）、確認し（２２２）、かつ変異体配列を生成する（２２４）ことができる。任意に、この結果が２１４において元のタンパク質における１つ以上のαヘリックス二次構造のうちの１つ以上が破壊されていることを示す場合、この工程において当該変異体を望ましくないものとして廃棄し、このようにして本方法を終了することができる。 If the α-helical secondary structure prediction results indicate that the water-soluble variant candidate at 214 maintains the same or most of the α-helical secondary structure as the original protein, it indicates that the This suggests that the specific pattern of QTY substitutions does not affect or significantly affect the α-helical secondary structure in the original protein. This TM region can then be predicted (220), confirmed (222), and variant sequences generated (224). Optionally, if this result indicates that one or more of the one or more α-helical secondary structures in the original protein is disrupted at 214, the variant is discarded as undesirable in this step. , thus the method can be ended.

他方、本発明の方法は、元のタンパク質と比較して予測されるＱＴＹ変異体がＴＭ領域を形成する傾向が低いかその傾向がないことを示すことも必要とする。従って、元のタンパク質における最初のＴＭ領域予測のために使用した同じＴＭ領域予測プログラムなどを用いて（必要であれば）、推定上水溶性である変異体をＴＭ領域予測に供すことができる。その結果が有意なＴＭ領域がなお存在することを示した場合、その変異体を廃棄してもよい。他方、その結果がＴＭ領域が存在しないことまたはＴＭ領域を形成する傾向が低いことを示した場合、その変異体を元のタンパク質よりも高い水溶性を有するがαヘリックス二次構造を維持し、かつ故に恐らく元のタンパク質の機能を維持している所望の変異体として選択することができる。 On the other hand, the method of the invention also requires showing that the predicted QTY variant has a reduced or no tendency to form a TM region compared to the original protein. Thus, putatively water-soluble variants can be subjected to TM region prediction, such as using the same TM region prediction program used for the initial TM region prediction in the original protein (if desired). If the results indicate that significant TM regions are still present, the variant may be discarded. On the other hand, if the results indicate the absence of a TM region or a reduced tendency to form a TM region, the mutant can be modified to have higher water solubility than the original protein but maintain an α-helical secondary structure; and therefore can be selected as a desired variant that presumably retains the function of the original protein.

所望であれば、さらなる工程を行って得られた水溶性変異体のさらなる特性評価を行うことができる。そのようなさらなる特性評価は、当該変異体のｐＩを計算する工程（２２６）を含んでいてもよく、かつこのｐＩを元のタンパク質のｐＩと比較する。このｐＩは変化しないか非常に僅かな（すなわち３０％未満または好ましくは２０％未満またはより好ましくは１０％未満の）変化でなければならない。他のさらなる特性評価は、ヘリックス車輪モデル（例えば図３に示すもの）を作成して（２４６）、任意の特定のＴＭ領域におけるＱＴＹ置換の位置および任意のクラスター化を示す工程を含んでもよい。 If desired, further steps can be performed to further characterize the resulting water-soluble variants. Such further characterization may include calculating the pI of the variant (226) and comparing this pI to the pI of the original protein. The pI should remain unchanged or change very slightly (ie less than 30% or preferably less than 20% or more preferably less than 10%). Other further characterizations may include creating a helical wheel model (eg, as shown in FIG. 3) 246 to indicate the location and any clustering of QTY substitutions in any particular TM region.

本発明のＱＴＹコードによるタンパク質（例えば、ＧＰＣＲ）の膜貫通領域を設計するための本発明の別の例示的な実施形態は、図９Ｂに記載されている代表的な方法１０を用いてコンピュータシステム上で行うことができ、詳細な工程のうちのいくつかについて以下にさらに説明する。その工程の多くは任意であるか、本発明の方法に従って組み合わせることができる。 Another exemplary embodiment of the present invention for designing transmembrane regions of proteins (e.g., GPCRs) according to the QTY codes of the present invention is performed using a computer system using the exemplary method 10 described in FIG. 9B. Some of the detailed steps that can be performed above are further described below. Many of the steps are optional or can be combined according to the method of the invention.

１：工程１では、コンピュータシステムのコンピュータインタフェースは、タンパク質配列を受け取り、分析のために選択し、かつ入力されるタンパク質（例えば、その配列）を記述しているデータをコンピュータシステムのコンピュータインタフェースを介してアップロードまたは入力する（１２）。入力されるデータは、タンパク質名、データベース参照またはタンパク質配列であってもよい。例えば、当該タンパク質配列はコンピュータインタフェースを介してアップロードすることができる。 1: In step 1, a computer interface of a computer system receives a protein sequence, selects it for analysis, and transmits data describing the input protein (e.g., its sequence) through the computer interface of the computer system. (12). The data entered may be a protein name, database reference or protein sequence. For example, the protein sequence can be uploaded via a computer interface.

２：工程２では、その名前または配列を含む当該タンパク質に関するさらなるデータを同定、決定、取得および／または入力することができ、かつコンピュータインタフェースを介して入力することができる。タンパク質データを得る（２０）ための１つのソースは、ＵｎｉＰｒｏｔという名称のデータベース（ｗｗｗ．ｕｎｉｐｒｏｔ．ｏｒｇ）である。あるいは、本発明の方法は、この工程におけるユーザによる後の検索のために、当該タンパク質または当該タンパク質に関連する配列に関するデータを記憶することができる。実施形態では、当該プログラムは、ユーザに分析のために選択されたタンパク質に関するさらなるデータ（例えば、配列データ）を検索するためのデータベースまたはファイルを選択することを促すことができる。 2: In step 2, further data regarding the protein may be identified, determined, obtained and/or entered, including its name or sequence, and may be entered via a computer interface. One source for obtaining protein data (20) is a database named UniProt (www.uniprot.org). Alternatively, the method of the invention may store data regarding the protein or sequences related to the protein for later retrieval by the user during this step. In embodiments, the program may prompt the user to select a database or file to search for additional data (eg, sequence data) regarding the protein selected for analysis.

３：工程３では、ユーザは、膜貫通領域を同定するデータを入力、アップロードまたは取得することができる。例えば、ユーザにＵｎｉＰｒｏｔなどの公開情報源からデータを取得するように促すことができる。その情報を確認し（３０）、工程５で使用するためにそのデータベースから収集することができる。 3: In step 3, the user can enter, upload or obtain data identifying the transmembrane region. For example, users may be prompted to obtain data from public sources such as UniProt. That information can be verified (30) and collected from the database for use in step 5.

４：代わりまたは追加として、入力されたタンパク質配列からＴＭ領域情報が容易に入手可能でない場合であっても、当該技術分野において承認されている任意の方法によって膜貫通領域を確立することができる（４０）。膜貫通領域は一般に、αへリックス立体構造を特徴とする。例えば、生物学的配列センター（ｗｗｗ．ｃｂｓ．ｄｔｕ．ｄｋ／ｓｅｒｖｉｃｅｓ／ＴＭＨＭＭ）によって開発されたＴＭＨＭＭ２．０という名称のソフトウェアモジュール／パッケージ（隠れマルコフモデルを用いる膜貫通予測）を用いて、膜貫通ヘリックス予測を行うことができる。このソフトウェアのバージョンは、ピーク探索に関する問題を有することがあり、ＧＰＣＲのための７つのＴＭ領域の発見に失敗する場合がある。従って、当該コンピュータシステムによって実行されるピーク探索法に動的ベースラインが導入されたこのプログラムの修正版を必要に応じて使用してもよい。ここでは、例えばＧＰＣＲの場合、最初のベースライン値を用いて７つ全てのＴＭ領域が見つからない場合、ベースラインをより低い値に変更することができる。例えば、デフォルトベースラインを０．２に設定してもよい。欠けている７つ目の膜貫通領域を同定するために、ベースライン値を０．１に設定することができる。８つ以上のＴＭ領域が見つかった場合、ベースラインを０．１５などのより高い値に変更して的外れなＴＭ予測を除外することができる。例えば、ＣＣＲ－２のアミノ酸配列をＴＭＨＭＭ２．０ソフトウェアに供した場合、６つの膜貫通領域のみが最初に同定された。しかし、ＴＭＨＭＭ２．０のベースライン値を０．０７に設定すると、正しい合計７つの膜貫通領域が同定された。次いで、ＴＭ領域予測の結果を工程５に与える。 4: Alternatively or additionally, even if TM region information is not readily available from the input protein sequence, the transmembrane region can be established by any art-recognized method ( 40). Transmembrane regions are generally characterized by an α-helical conformation. For example, a software module/package named TMHMM2.0 (Transmembrane Prediction Using Hidden Markov Models) developed by the Center for Biological Sequencing (www.cbs.dtu.dk/services/TMHMM) was used to predict transmembrane penetration. Helix predictions can be made. This software version may have problems with peak search and may fail to find the 7 TM regions for GPCRs. Accordingly, a modified version of this program in which a dynamic baseline is introduced into the peak search method executed by the computer system may be used if desired. Here, for example in the case of GPCR, if all seven TM regions are not found using the initial baseline value, the baseline can be changed to a lower value. For example, the default baseline may be set to 0.2. To identify the missing seventh transmembrane region, the baseline value can be set to 0.1. If more than 8 TM regions are found, the baseline can be changed to a higher value, such as 0.15, to filter out off-the-mark TM predictions. For example, when the amino acid sequence of CCR-2 was submitted to TMHMM2.0 software, only six transmembrane regions were initially identified. However, when the baseline value of TMHMM2.0 was set to 0.07, a total of seven correct transmembrane regions were identified. The results of the TM region prediction are then provided to step 5.

５：工程５では、新たな予測により、あるいは最初の配列入力からそのような情報を得ることによりＴＭデータを同定した後、ＧＰＣＲの配列をＴＭ領域情報に従って合計１５個の断片（すなわち、７つの膜貫通型セグメント（７つのＴＭ）（５２）および８つの非膜貫通セグメント（８つのＮＴＭ）（５４）に分ける（５０）。すなわち、典型的なＧＰＣＲのそれぞれに対して７つのＴＭおよび８つのＮＴＭ断片が存在しなければならない。 5: In step 5, after identifying the TM data by new predictions or by obtaining such information from the initial sequence input, the sequence of the GPCR is divided into a total of 15 fragments (i.e., 7 fragments) according to the TM region information. (50) into transmembrane segments (7 TMs) (52) and eight non-membrane segments (8 NTMs) (54); i.e., seven TMs and eight NTMs for each of the typical GPCRs. NTM fragments must be present.

当然ながら、本システムは、ユーザによる入力のためにコンピュータインタフェースを用いて上記工程のうちの１つ以上、例えば全てを実行することができる。また、当然ながら、本システムは上記工程のうちの１つ以上を省略したり２つ以上の工程を組み合わせたりすることができる。 Of course, the system may perform one or more, eg, all, of the steps described above using a computer interface for user input. Also, of course, the present system can omit one or more of the above steps or combine two or more steps.

６：工程６では、当該タンパク質の所与のＴＭ領域内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの選択されたサブセットに対してＱＴＹ置換を部分的に行う（６０）。具体的には、変異のために第１の膜貫通領域（典型的には、必須ではないが、当該タンパク質のＮ末端に最も近位の膜貫通領域）を最初に選択する。次いで、第１の膜貫通領域内の疎水性アミノ酸（Ｌ、Ｉ、ＶおよびＦ）の一部または全てを、対応する非イオン性の親水性アミノ酸（Ｑ／Ｓ／Ｎ、Ｔ／Ｓ／Ｎ、Ｔ／Ｓ／ＮまたはＹ）で置換する。当然ながら、この場合このアミノ酸は実際には当該タンパク質の中に置換されていない。むしろ、このアミノ酸指定はモデリング用の配列において置換されている。従って、「配列」という用語は「配列データ」を含むものとする。典型的には、疎水性アミノ酸の大部分または全てを置換のために選択する。全てに満たないアミノ酸を選択した場合、当該膜貫通領域の１つ以上のＮおよび／またはＣ末端アミノ酸を疎水性のままにして内部疎水性アミノ酸を選択することが望ましい場合がある。追加または代わりとして、膜貫通領域内のロイシン（Ｌ）の全てを置換するように選択することが望ましい場合がある。追加または代わりとして、膜貫通領域内のイソロイシン（Ｉ）を全て選択して置換することが望ましい場合がある。追加または代わりとして、膜貫通領域内のバリン（Ｖ）を全て選択して置換することが望ましい場合がある。追加または代わりとして、膜貫通領域内のフェニルアラニン（Ｆ）を全て選択して置換することが望ましい場合がある。追加または代わりとして、膜貫通領域内の１つ以上のフェニルアラニンを保持すると有利になり得る。追加または代わりとして、膜貫通領域内の１つ以上のバリンを保持すると有利になり得る。追加または代わりとして、膜貫通領域内の１つ以上のロイシンを保持すると有利になり得る。追加または代わりとして、膜貫通領域内の１つ以上のイソロイシンを保持すると有利になり得る。追加または代わりとして、その野生型配列が３つ以上の連続する疎水性アミノ酸を特徴とする膜貫通領域内の１つ以上の疎水性アミノ酸を保持すると有利になり得る。 6: In step 6, partial QTY substitutions are made to a selected subset of hydrophobic amino acids L, I, V and F within a given TM region of the protein (60). Specifically, the first transmembrane region (typically, but not necessarily, the transmembrane region most proximal to the N-terminus of the protein) is first selected for mutation. Some or all of the hydrophobic amino acids (L, I, V and F) in the first transmembrane region are then replaced with the corresponding nonionic hydrophilic amino acids (Q/S/N, T/S/N , T/S/N or Y). Of course, in this case this amino acid is not actually substituted into the protein. Rather, this amino acid designation is substituted in the modeling sequence. Accordingly, the term "sequence" shall include "sequence data." Typically, most or all of the hydrophobic amino acids are selected for substitution. If less than all amino acids are selected, it may be desirable to leave one or more N- and/or C-terminal amino acids of the transmembrane region hydrophobic and select internal hydrophobic amino acids. Additionally or alternatively, it may be desirable to choose to replace all of the leucines (L) within the transmembrane region. Additionally or alternatively, it may be desirable to selectively replace all isoleucines (I) within the transmembrane region. Additionally or alternatively, it may be desirable to selectively replace all valines (V) within the transmembrane region. Additionally or alternatively, it may be desirable to selectively replace all phenylalanines (F) within the transmembrane region. Additionally or alternatively, it may be advantageous to retain one or more phenylalanines within the transmembrane region. Additionally or alternatively, it may be advantageous to retain one or more valines within the transmembrane region. Additionally or alternatively, it may be advantageous to retain one or more leucines within the transmembrane region. Additionally or alternatively, it may be advantageous to retain one or more isoleucines within the transmembrane region. Additionally or alternatively, it may be advantageous to retain one or more hydrophobic amino acids within a transmembrane region whose wild-type sequence is characterized by three or more consecutive hydrophobic amino acids.

７：工程７では、そのように設計した膜貫通領域を元のタンパク質の配列文脈の中に戻す。すなわち、置換の各セットによりそのＴＭ領域に対して１種の特定の推定上の変異体が生成されるため、ＱＴＹ置換を有する変異または再設計されたＴＭ領域（６２）を元のタンパク質の対応するＴＭ領域と交換して膜貫通変異体すなわち「推定上の変異体」を生成する（７０）。同時に、これらの関連する推定上の変異体は、推定上の変異体の第１のライブラリーを形成する。 7: Step 7 places the so designed transmembrane region back into the original protein sequence context. That is, each set of substitutions generates one specific putative variant for that TM region, so that a mutated or redesigned TM region (62) with a QTY substitution can be compared to its original protein counterpart. transmembrane mutants or “putative mutants” (70). Together, these related putative variants form a first library of putative variants.

８：次いで、工程８２および８４では、各推定上の変異体を本明細書に記載されている膜貫通領域予測方法（８４）に供する（例えば、予測されるＴＭ領域の喪失）。当該変異体を、その配列のαヘリックスを形成する傾向のスコアについても評価する（８２）。また、当該変異体を本明細書に記載されている水溶性予測方法にも供する。例えば、当該変異体を、その配列の水溶性である傾向のスコアについて評価する。そのようなスコアは、予測されるＴＭ領域を形成する傾向に基づいていてもよく、ＴＭ領域を形成する傾向の強さは低い水溶性に関連づけられ、ＴＭ領域を形成する傾向の低さまたはその傾向がないことは高い水溶性に関連づけられる。当然ながら、全ての濃度における完全な水溶性は大部分の商業目的には必要ではない。水溶性は、好ましくは予測される使用条件（例えば、リガンド結合アッセイ）における機能性に必要なものであるように決定する。 8: Steps 82 and 84 then subject each putative variant to the transmembrane region prediction method (84) described herein (eg, loss of predicted TM region). The mutants are also evaluated for their sequence's propensity to form alpha helices (82). The mutants are also subjected to the water solubility prediction method described herein. For example, the variant is evaluated for its sequence's propensity to be water soluble score. Such scores may be based on the predicted tendency to form TM regions, where a strong tendency to form TM regions is associated with low water solubility, and a low tendency to form TM regions or its The lack of tendency is associated with high water solubility. Of course, complete water solubility at all concentrations is not necessary for most commercial purposes. Aqueous solubility is preferably determined to be that necessary for functionality in the expected conditions of use (eg, ligand binding assays).

９：工程９では、αへリックス構造の喪失および／または「水不溶性」を予測する（期待される使用条件において予測）推定上の変異体を廃棄する。例えば、αヘリックスの二次構造予測結果およびＴＭ領域／水溶性予測結果のランク付け関数に基づく重み付けされた組み合わせである組み合わせスコアまたはランク（９０）を用いて、αへリックス構造および水溶性を予測する推定上の変異体を選択することができる。例えば、αへリックス構造が損なわれ得るという予測が可能であれば、高水溶性であるか０、１、２または３つの疎水性アミノ酸（例えば、水溶性予測結果に対してより高い重み）を特徴とする膜貫通変異体を選択することができる。代わりまたは追加として、３、４、５または６つの疎水性アミノ酸を特徴とする高度なαヘリックス構造（例えば、αへリックス二次構造予測結果に対してより高い重み）を選択することができる。 9: In step 9, putative variants that predict loss of alpha helical structure and/or "water insolubility" (as predicted under expected conditions of use) are discarded. For example, a combination score or rank (90), which is a weighted combination based on a ranking function of α-helix secondary structure prediction results and TM region/water solubility prediction results, is used to predict α-helix structure and water solubility. putative mutants can be selected. For example, if it is possible to predict that the α-helical structure may be compromised, we can assign highly water-soluble or 0, 1, 2, or 3 hydrophobic amino acids (e.g., a higher weight for the predicted water solubility). Characterized transmembrane mutants can be selected. Alternatively or additionally, advanced α-helical structures characterized by 3, 4, 5 or 6 hydrophobic amino acids (eg, higher weight for α-helical secondary structure prediction results) can be selected.

１０：工程１０では、同じライブラリー内の推定上の変異体を上に概説したスコア計算スキーム（９４）に基づいてソートまたはランク付けすることができる（１００）。次いで、所定数の推定上の変異体を第１の推定上の変異体のライブラリーの最終メンバーとして選択することができる。例えば、上記組み合わせスコアにおいて、０のスコアはＴＭ領域を形成する傾向がないこと、および元のαへリックス二次構造の完全な維持、故に最も所望の推定上の変異体であることを意味する。僅かにより高いスコアは、ＴＭ領域を形成する傾向が僅かである（または水溶性である傾向が低い）ことを示してもよい。従って、この推定上の変異体はあまり望ましくないが、当該ライブラリー内の他の推定上の変異体と比較してその優れた組み合わせスコアに基づいてなお選択することができる。 10: In step 10, putative variants within the same library can be sorted or ranked (100) based on the scoring scheme (94) outlined above. A predetermined number of putative variants can then be selected as final members of the first putative variant library. For example, in the above combination scores, a score of 0 means no tendency to form TM regions and complete preservation of the original α-helical secondary structure, thus being the most desired putative variant. . A slightly higher score may indicate a slighter tendency to form TM regions (or a lower tendency to be water soluble). Therefore, although this putative variant is less desirable, it can still be selected based on its superior combination score compared to other putative variants in the library.

特定の実施形態では、１０、９、８、７、６、５、４、３、２または１個などの所定数の所望の推定上の変異体を選択することができる。 In certain embodiments, a predetermined number of desired putative variants can be selected, such as 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1.

第２、第３、第４、第５、第６および／または第７（それ以上）の膜貫通領域またはドメインに対してこれらの工程（例えば、工程６～１０）を繰り返して、そのようなＴＭ領域またはドメインのそれぞれに１つの推定上の変異体のライブラリーを作製することができる。 Repeat these steps (e.g., steps 6-10) for the second, third, fourth, fifth, sixth and/or seventh (or more) transmembrane regions or domains to A library of putative variants can be generated, one for each of the TM regions or domains.

１１：工程１１では、推定上の変異体を有するＴＭ領域またはドメインと置換されていない非ＴＭ領域との組み合わせを選択することができる（１１０）。例えば、高いαヘリックス構造スコアを有する推定上の変異体を含む１つ、２つ、３つまたは４つのドメインと、高い水溶性スコアを有する推定上の変異体を含む１つ、２つ、３つ、４つ、５つまたは６つのドメインを組み合わせることができる。別の例では、複数の変異体の選択において、全ての疎水性アミノ酸が親水性アミノ酸で置換されており、故にその水溶性スコアを最大化することを特徴とするドメイン／ＴＭ領域と、３つ、４つまたは５つの疎水性アミノ酸を保持する第２のドメイン／ＴＭ領域とを組み合わせることができる。当該技術分野で知られているように、そのような選択された推定上の変異体を細胞外および細胞内ドメインと「組み替え」て推定上水溶性であるタンパク質変異体の最初のコンビナトリアルライブラリーを作製することができる。 11: In step 11, a combination of a TM region or domain with a putative variant and an unsubstituted non-TM region can be selected (110). For example, one, two, three or four domains containing a putative variant with a high α-helical structure score and one, two, three domains containing a putative variant with a high water solubility score. One, four, five or six domains can be combined. In another example, in the selection of multiple variants, a domain/TM region characterized in that all hydrophobic amino acids are replaced with hydrophilic amino acids, thus maximizing its water solubility score; , a second domain/TM region carrying 4 or 5 hydrophobic amino acids. As is known in the art, such selected putative variants are "recombined" with extracellular and intracellular domains to create the first combinatorial library of putatively water-soluble protein variants. It can be made.

特定の実施形態では、本明細書に記載されているように設計された最初のコンビナトリアルライブラリーの推定上水溶性であるタンパク質変異体の全てまたは断片を調製（生体外または宿主細胞において産生または発現）させ、かつ好ましくはハイスループットスクリーニングで水溶性および／またはリガンド結合についてスクリーニングすることができる。例えば、当該ライブラリーの増幅により、１００％未満の推定上水溶性であるタンパク質組み合わせ変異体を発現させることができる。当該技術分野でよく知られているように、リガンド結合をスクリーニングするためにレポーターシステムを使用することができる。本発明の方法を用いて、機能的に組み合わせた細胞外および細胞内ドメインを含む推定上水溶性である修飾された膜貫通組み合わせ変異体のライブラリーを迅速に同定し、野生型タンパク質の適切な三次元構造を有し、かつリガンド結合機能（結合親和性を含む）または他の機能を保持する水溶性タンパク質変異体を産生することができる。当該ソフトウェアは、タンパク質変異体の確認された機能性を使用して特定の変異体を除外するか、それらを異なるようにランク付けする学習モジュールを備えることができる。 In certain embodiments, all or fragments of putatively water-soluble protein variants of an initial combinatorial library designed as described herein are prepared (produced or expressed in vitro or in host cells). ) and preferably screened for water solubility and/or ligand binding in high throughput screening. For example, amplification of the library can express protein combination variants that are putatively less than 100% water soluble. Reporter systems can be used to screen for ligand binding, as is well known in the art. Using the methods of the invention, libraries of putatively water-soluble modified transmembrane combinatorial variants containing functionally combined extracellular and intracellular domains can be rapidly identified, and the appropriate Water-soluble protein variants can be produced that have three-dimensional structure and retain ligand binding function (including binding affinity) or other functions. The software can include a learning module that uses the confirmed functionality of protein variants to exclude certain variants or rank them differently.

特定の実施形態では、実験的に実用的なものにするために、最初のコンビナトリアルライブラリーは、約２百万種の潜在的に水溶性のＧＰＣＲまたはＣＸＣＲ４変異体を有する。当然ながら、それよりも多いまたは少ない変異体のライブラリーも設計することができる。特定の実施形態では、本明細書に記載されている研究結果の分析に基づいて最適化することができるため、より小さいライブラリーが好ましい場合がある。研究結果の分析により恐らく、組み替えられるドメイン変異体の数を最適化する傾向およびドメイン変異体を選択するための推定が確立される。 In certain embodiments, to be experimentally practical, the initial combinatorial library has approximately 2 million potentially water-soluble GPCR or CXCR4 variants. Of course, libraries with more or fewer variants can also be designed. In certain embodiments, smaller libraries may be preferred as they can be optimized based on analysis of the study results described herein. Analysis of the study results will likely establish trends for optimizing the number of domain variants to be recombined and assumptions for selecting domain variants.

特定の実施形態では、「ヘリックス予測スコア」（ｗｗｗ．ｐｒｏｔｅｏｐｅｄｉａ．ｏｒｇ／ｗｉｋｉ／ｉｎｄｅｘ．ｐｈｐ／Ｍａｉｎ＿Ｐａｇｅを参照）としても知られているへリックス形成傾向に基づく修飾のために、膜貫通タンパク質のＴＭ領域内の特定の疎水性アミノ酸を選択する。様々な断片をランダムに組み合わせて全長ＧＰＣＲ遺伝子の約２百万（８^７）種の変異体を形成する。予測される変異体の数は、一般に式Ｈ^ｎ（式中、ｎ＝本方法によって修飾および／または変更された膜貫通領域の数（ＧＰＣＲの例では、ｎ＝７）およびＨ＝組み合わせ変異体を産生するために利用可能な各膜貫通領域における推定上の変異体の数）によって特徴づけることができる。 In certain embodiments, the TM of transmembrane proteins is modified for modification based on helix formation propensity, also known as "helix prediction score" (see www.proteopedia.org/wiki/index.php/Main_Page). Select specific hydrophobic amino acids within the region. The various fragments are randomly combined to form approximately 2 million ( ⁸⁷ ) variants of the full-length GPCR gene. The number of predicted variants is generally determined by the formula H ⁿ , where n = number of transmembrane regions modified and/or altered by the method (in the GPCR example, n = 7) and H = combinatorial variants. (the number of putative mutants in each transmembrane region available to produce).

最初のコンビナトリアルライブラリーすなわち組み替えられるドメイン変異体群を選択したら、最初のコンビナトリアルライブラリー内のタンパク質をコードする核酸分子すなわちＤＮＡまたはｃＤＮＡ分子を設計することができる。これらの核酸分子は、コード配列のライブラリーを作製するために選択された発現系のためにコドン最適化およびイントロン欠失を行うように設計することが好ましい。例えば、発現系が大腸菌である場合、大腸菌発現のために最適化されたコドンを選択することができる。ｗｗｗ．ｄｎａ２０．ｃｏｍ／ｒｅｓｏｕｒｃｅｓ／ｇｅｎｅｄｅｓｉｇｎｅｒを参照されたい。また、発現系（例えば、大腸菌）での発現に適したプロモーターなどのプロモーター領域を選択し、コード配列のライブラリー内のコード配列に機能的に連結させる。 Once an initial combinatorial library, or set of domain variants to be recombined, has been selected, nucleic acid molecules, DNA or cDNA molecules, can be designed that encode the proteins within the initial combinatorial library. Preferably, these nucleic acid molecules are designed to perform codon optimization and intron deletion for the selected expression system to generate a library of coding sequences. For example, if the expression system is E. coli, codons optimized for E. coli expression can be selected. www. dna20. com/resources/genedesigner. Also, a promoter region, such as a promoter suitable for expression in an expression system (eg, E. coli), is selected and operably linked to a coding sequence within a library of coding sequences.

次いで、コード配列の最初のライブラリーまたはその一部を発現させて推定上水溶性であるＧＰＣＲのライブラリーを作製する。次いで、このライブラリーをリガンド結合アッセイに供する。結合アッセイでは、推定上水溶性であるＧＰＣＲを好ましくは水性媒体中でリガンドに接触させ、リガンド結合を検出する。 The initial library of coding sequences, or a portion thereof, is then expressed to generate a library of putatively water-soluble GPCRs. This library is then subjected to a ligand binding assay. In binding assays, a putatively water-soluble GPCR is contacted with a ligand, preferably in an aqueous medium, and ligand binding is detected.

本発明は、本明細書に記載されている方法から得られるか得ることができる膜貫通ドメイン変異体およびそれをコードする核酸分子を含む。 The present invention includes transmembrane domain variants and nucleic acid molecules encoding the same obtained or obtainable from the methods described herein.

本発明は、それぞれＱ、Ｔ、ＴまたはＹによって置換されている天然膜貫通タンパク質（例えば、ＧＰＣＲ）の少なくとも５０％、好ましくは少なくとも約６０％、より好ましくは少なくとも約７０％または８０％、例えば少なくとも約９０％の疎水性アミノ酸残基（Ｌ、Ｉ、ＶおよびＦ）によって独立して特徴づけられる複数の膜貫通ドメインを特徴とする水溶性ＧＰＣＲ変異体（「ｓＧＰＣＲ」）も想定している。本発明のｓＧＰＣＲは、水溶性およびリガンド結合によって特徴づけられる。特に、ｓＧＰＣＲは対応する天然ＧＰＣＲと同じ天然リガンドに結合する。 The invention provides that at least 50%, preferably at least about 60%, more preferably at least about 70% or 80% of naturally occurring transmembrane proteins (e.g. GPCRs) are replaced by Q, T, T or Y, respectively, e.g. Water-soluble GPCR variants (“sGPCRs”) characterized by multiple transmembrane domains independently characterized by at least about 90% hydrophobic amino acid residues (L, I, V and F) are also envisioned. . The sGPCRs of the invention are characterized by water solubility and ligand binding. In particular, sGPCRs bind the same natural ligands as the corresponding natural GPCRs.

本発明は、膜タンパク質の活性によって媒介される障害および疾患を治療するために水溶性ポリペプチドの使用を含む前記障害または疾患の治療法であって、前記水溶性ポリペプチドは修飾されたαヘリックスドメインを含み、かつ前記水溶性ポリペプチドはその天然膜タンパク質のリガンド結合活性を保持していることを特徴とする方法をさらに包含する。そのような障害および疾患の例としては、限定されるものではないが、癌、小細胞肺癌、黒色腫、乳癌、パーキンソン病、心血管疾患、高血圧症および喘息が挙げられる。 The present invention is a method of treating disorders and diseases mediated by the activity of membrane proteins, comprising the use of water-soluble polypeptides to treat disorders and diseases mediated by the activity of membrane proteins, wherein said water-soluble polypeptides have a modified alpha-helical structure. and the water-soluble polypeptide retains the ligand-binding activity of the native membrane protein. Examples of such disorders and diseases include, but are not limited to, cancer, small cell lung cancer, melanoma, breast cancer, Parkinson's disease, cardiovascular disease, hypertension, and asthma.

本明細書に記載されているように、膜タンパク質の活性によって媒介される病気または疾患の治療のために本明細書に記載されている水溶性ペプチドを使用することができる。特定の態様では、本水溶性ペプチドは膜受容体の「デコイ」として機能し、そうでなければ膜受容体を活性化させるリガンドに結合することができる。従って、本明細書に記載されている水溶性ペプチドを使用して膜タンパク質の活性を低下させることができる。これらの水溶性ペプチドは循環系中に残り、特異的リガンドに競合的に結合し、それにより膜結合受容体の活性を低下させることができる。例えば、GPCR CXCR4は小細胞肺癌において過剰発現され、腫瘍細胞の転移を促進する。本明細書に記載されているような水溶性ペプチドによるこのリガンドへの結合により転移を有意に減少させることができる。 As described herein, the water-soluble peptides described herein can be used for the treatment of diseases or diseases mediated by the activity of membrane proteins. In certain embodiments, the water-soluble peptides can function as "decoys" for membrane receptors and bind ligands that would otherwise activate membrane receptors. Accordingly, the water-soluble peptides described herein can be used to reduce the activity of membrane proteins. These water-soluble peptides remain in the circulation and can competitively bind specific ligands, thereby reducing the activity of membrane-bound receptors. For example, the GPCR CXCR4 is overexpressed in small cell lung cancer and promotes tumor cell metastasis. Coupling to this ligand with water-soluble peptides as described herein can significantly reduce metastasis.

ケモカイン受容体ＣＸＣＲ４は、Ｔ細胞株向性ＨＩＶの侵入のための主要な補助受容体としてウイルス研究において知られている(Feng et al. (1996) Science 272: 872-877; Davis et al. (1997) J Exp Med 186: 1793-1798; Zaitseva et al. (1997) Nat Med 3: 1369-1375; Sanchez et al. (1997) J Biol Chem 272: 27529-27531)。間質細胞由来因子１（ＳＤＦ－１）はＣＸＣＲ４と特異的に相互作用するケモカインである。ＳＤＦ－１がＣＸＣＲ４に結合した場合、ＣＸＣＲ４は、リンパ球、巨核球および造血幹細胞中のＲａｓ／ＭＡＰキナーゼおよびホスファチジルイノシトール３－キナーゼ（ＰＩ３Ｋ）／Ａｋｔなどの下流キナーゼ経路(Bleul et al. (1996) Nature 382: 829-833; Deng et al. (1997) Nature 388: 296-300; Kijowski et al. (2001) Stem Cells 19: 453-466; Majka et al. (2001) Folia. Histochem. Cytobiol. 39: 235-244; Sotsios et al. (1999) J. Immunol. 163: 5954-5963; Vlahakis et al. (2002) J. Immunol. 169: 5546-5554) を含むＧαｉタンパク質媒介性シグナル伝達（百日咳毒素感受性）を活性化させる(Chen et al. (1998) Mol Pharmacol 53: 177-181)。ヒトリンパ節が移植されたマウスにおいて、ＳＤＦ－１は移植されたリンパ節へのＣＸＣＲ４陽性細胞遊走を誘導する(Blades et al. (2002) J. Immunol. 168: 4308-4317)。 The chemokine receptor CXCR4 is known in virus research as the major co-receptor for T-cell line-tropic HIV entry (Feng et al. (1996) Science 272: 872-877; Davis et al. 1997) J Exp Med 186: 1793-1798; Zaitseva et al. (1997) Nat Med 3: 1369-1375; Sanchez et al. (1997) J Biol Chem 272: 27529-27531). Stromal cell-derived factor 1 (SDF-1) is a chemokine that specifically interacts with CXCR4. When SDF-1 binds to CXCR4, CXCR4 interacts with downstream kinase pathways such as Ras/MAP kinase and phosphatidylinositol 3-kinase (PI3K)/Akt in lymphocytes, megakaryocytes, and hematopoietic stem cells (Bleul et al. (1996) ) Nature 382: 829-833; Deng et al. (1997) Nature 388: 296-300; Kijowski et al. (2001) Stem Cells 19: 453-466; Majka et al. (2001) Folia. Histochem. Cytobiol. 39: 235-244; Sotsios et al. (1999) J. Immunol. 163: 5954-5963; Vlahakis et al. (Chen et al. (1998) Mol Pharmacol 53: 177-181). In mice transplanted with human lymph nodes, SDF-1 induces CXCR4-positive cell migration to the transplanted lymph nodes (Blades et al. (2002) J. Immunol. 168: 4308-4317).

最近の研究から、ＣＸＣＲ４相互作用により転移性細胞の遊走を制御することができることが分かった。低酸素症すなわち酸素分圧の低下は、大部分の固形腫瘍において生じる微小環境の変化であり、腫瘍の血管新生および治療抵抗性の主要な誘導因子である。低酸素症はＣＸＣＲ４レベルを上昇させる(Staller et al. (2003) Nature 425: 307-311)。転移活性が上昇した骨転移モデル由来の細胞の亜集団に対するマイクロアレイ分析から、転移表現型において増加した遺伝子のうちの１種はＣＸＣＲ４であることが分かった。さらに、単離された細胞におけるＣＸＣＲ４の過剰発現により転移活性が有意に上昇した(Kang et al. (2003) Cancer Cell 3: 537-549)。様々な乳癌患者から採取した試料において、Ｍｕｌｌｅｒら(Muller et al. (2001) Nature 410: 50-56)は、ＣＸＣＲ４発現レベルは正常な乳腺または上皮細胞に対して原発性腫瘍においてより高いことを見い出した。さらに、ＣＸＣＲ４抗体治療は、全てがリンパ節および肺に転移した対照アイソタイプと比較して所属リンパ節への転移を阻害することが分かった(Muller et al. (2001))。従って、デコイ治療法モデルはＣＸＣＲ４媒介性疾患および障害を治療するのに適している。 Recent studies have shown that CXCR4 interaction can control the migration of metastatic cells. Hypoxia, or a decrease in oxygen tension, is a microenvironmental change that occurs in most solid tumors and is a major inducer of tumor angiogenesis and treatment resistance. Hypoxia increases CXCR4 levels (Staller et al. (2003) Nature 425: 307-311). Microarray analysis of a subpopulation of cells from a bone metastasis model with increased metastatic activity revealed that one of the genes increased in the metastatic phenotype was CXCR4. Furthermore, overexpression of CXCR4 in isolated cells significantly increased metastatic activity (Kang et al. (2003) Cancer Cell 3: 537-549). In samples taken from various breast cancer patients, Muller et al. (2001) Nature 410: 50-56 found that CXCR4 expression levels were higher in primary tumors versus normal mammary gland or epithelial cells. I found it. Furthermore, CXCR4 antibody treatment was found to inhibit regional lymph node metastasis compared to the control isotype, which all metastasized to lymph nodes and lungs (Muller et al. (2001)). Therefore, the decoy therapy model is suitable for treating CXCR4-mediated diseases and disorders.

別の実施形態では、本発明は、白血球動員または活性化異常を伴うＣＸＣＲ４依存の走化性に関連する疾患または障害の治療に関する。当該疾患は、関節炎、乾癬、多発性硬化症、潰瘍性大腸炎、クローン病、アレルギー、喘息、ＡＩＤＳ関連脳炎、ＡＩＤＳ関連斑状丘疹状皮疹、ＡＩＤＳ関連間質性肺炎、ＡＩＤＳ関連腸疾患、ＡＩＤＳ関連門脈周囲肺炎およびＡＩＤＳ関連糸球体腎炎からなる群から選択される。 In another embodiment, the invention relates to the treatment of diseases or disorders associated with CXCR4-dependent chemotaxis with abnormal leukocyte recruitment or activation. The diseases include arthritis, psoriasis, multiple sclerosis, ulcerative colitis, Crohn's disease, allergies, asthma, AIDS-related encephalitis, AIDS-related maculopapular eruption, AIDS-related interstitial pneumonia, AIDS-related enteropathy, and AIDS-related diseases. selected from the group consisting of periportal pneumonia and AIDS-associated glomerulonephritis.

別の態様では、本発明は、関節炎、リンパ腫、非小細胞肺癌、肺癌、乳癌、前立腺癌、多発性硬化症、中枢神経系発達障害、認知症、パーキンソン病、アルツハイマー病、腫瘍、線維腫、星状細胞腫、骨髄腫、神経膠芽腫、炎症性疾患、臓器移植拒絶反応、ＡＩＤＳ、ＨＩＶ感染または血管新生から選択される疾患または障害の治療に関する。 In another aspect, the invention provides for arthritis, lymphoma, non-small cell lung cancer, lung cancer, breast cancer, prostate cancer, multiple sclerosis, central nervous system developmental disorders, dementia, Parkinson's disease, Alzheimer's disease, tumors, fibroids, It relates to the treatment of a disease or disorder selected from astrocytoma, myeloma, glioblastoma, inflammatory diseases, organ transplant rejection, AIDS, HIV infection or angiogenesis.

本発明は、前記水溶性ポリペプチドおよび薬学的に許容される担体または希釈液を含む医薬組成物も包含する。 The present invention also encompasses pharmaceutical compositions comprising the water-soluble polypeptide and a pharmaceutically acceptable carrier or diluent.

本組成物は、所望の製剤に応じて、動物またはヒトへの投与のための医薬組成物を製剤化するために一般に使用される賦形剤として定義される薬学的に許容される非毒性担体または希釈液も含むことができる。当該希釈液は、薬剤または薬理組成物の生物学的活性に影響を与えないように選択される。そのような希釈液の例は、蒸留水、生理的リン酸緩衝食塩水、リンゲル液、デキストロース溶液およびハンクス液である。また、本医薬組成物または製剤は、他の担体、アジュバントまたは非毒性の非治療的な非免疫原性の安定化剤なども含んでもよい。医薬組成物は、タンパク質、キトサンなどの多糖類、ポリ乳酸、ポリグリコール酸およびコポリマー（例えば、ラテックス官能化セファロース(latex functionalized SEPHAROSE)（商標）、アガロース、セルロースなど）、重合アミノ酸、アミノ酸コポリマーおよび脂質凝集物（例えば、油滴またはリポソーム）などの大きなゆっくりと代謝される巨大分子も含むことができる。 The present compositions may contain pharmaceutically acceptable non-toxic carriers, defined as excipients commonly used to formulate pharmaceutical compositions for administration to animals or humans, depending on the desired formulation. Alternatively, a diluent may also be included. The diluent is selected so as not to affect the biological activity of the drug or pharmaceutical composition. Examples of such diluents are distilled water, physiological phosphate buffered saline, Ringer's solution, dextrose solution and Hank's solution. The pharmaceutical composition or formulation may also include other carriers, adjuvants or non-toxic, non-therapeutic, non-immunogenic stabilizers, and the like. Pharmaceutical compositions include proteins, polysaccharides such as chitosan, polylactic acids, polyglycolic acids and copolymers (e.g., latex functionalized SEPHAROSE™, agarose, cellulose, etc.), polymerized amino acids, amino acid copolymers and lipids. Large slowly metabolized macromolecules such as aggregates (eg, oil droplets or liposomes) can also be included.

本組成物は、例えば、静脈内、筋肉内、クモ膜下腔内または皮下注射などにより非経口投与することができる。組成物を溶液または懸濁液に組み込むことにより、非経口投与を達成することができる。そのような溶液または懸濁液としては、注射用水などの無菌希釈液、生理食塩水、不揮発性油、ポリエチレングリコール、グリセリン、プロピレングリコールまたは他の合成の溶媒も含んでもよい。非経口製剤は、例えばベンジルアルコールまたはメチルパラベンなどの抗菌剤、例えばアスコルビン酸または重亜硫酸ナトリウムなどの抗酸化剤およびＥＤＴＡなどのキレート剤も含んでもよい。酢酸塩、クエン酸塩またはリン酸塩などの緩衝剤および塩化ナトリウムまたはデキストロースなどの緊張調整剤も添加してもよい。当該非経口製剤をガラスまたはプラスチック製のアンプル、使い捨て注射器または複数回投与用バイアルに封入することができる。 The composition can be administered parenterally, for example, by intravenous, intramuscular, intrathecal or subcutaneous injection. Parenteral administration can be accomplished by incorporating the composition into a solution or suspension. Such solutions or suspensions may also include sterile diluents such as water for injection, saline, fixed oils, polyethylene glycols, glycerin, propylene glycol or other synthetic solvents. Parenteral formulations may also contain antimicrobial agents such as benzyl alcohol or methylparaben, antioxidants such as ascorbic acid or sodium bisulfite, and chelating agents such as EDTA. Buffers such as acetate, citrate or phosphate and tonicity agents such as sodium chloride or dextrose may also be added. The parenteral preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of glass or plastic.

さらに、浸潤剤、乳化剤、界面活性剤、ｐＨ緩衝物質などの補助物質が組成物中に存在していてもよい。医薬組成物の他の構成要素は、石油、動物、植物または合成由来の油、例えば、落花生油、大豆油および鉱油である。一般に、プロピレングリコールまたはポリエチレングリコールなどのグリコールは、特に注射溶液のための好ましい液体担体である。 Furthermore, auxiliary substances such as wetting agents, emulsifiers, surfactants, pH buffering substances, etc. may be present in the composition. Other components of the pharmaceutical composition are oils of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil and mineral oil. In general, glycols such as propylene glycol or polyethylene glycol are preferred liquid carriers, especially for injectable solutions.

液体溶液または懸濁液のいずれかとして注射製剤を調製することができ、注射前に溶液または懸濁液すなわち液体賦形剤に溶解するのに適した固形も調製することができる。また、この製剤は、上に記載したようにアジュバント効果を高めるために、リポソームあるいはポリ乳酸、ポリグリコライドまたはコポリマーなどのマイクロ粒子に乳化または封入されていてもよい（Langer, Science 249: 1527, 1990およびHanes, Advanced Drug Delivery Reviews 28: 97-119, 1997）。本明細書に記載されている組成物および薬剤は、有効成分の持続放出またはパルス放出を可能にするように製剤化することができるデポ注射または埋込製剤の形態で投与することができる。 Injectable preparations can be prepared either as liquid solutions or suspensions, and solid forms suitable for solution or suspension in liquid vehicles prior to injection can also be prepared. The formulation may also be emulsified or encapsulated in liposomes or microparticles such as polylactic acid, polyglycolide or copolymers to enhance the adjuvant effect as described above (Langer, Science 249: 1527, 1990 and Hanes, Advanced Drug Delivery Reviews 28: 97-119, 1997). The compositions and medicaments described herein can be administered in the form of depot injection or implant preparations that can be formulated to provide sustained or pulsatile release of the active ingredient.

経皮投与は、皮膚からの本組成物の経皮吸収を含む。経皮製剤としてはパッチ、軟膏、クリーム、ゲル、膏薬などが挙げられる。皮膚パッチまたはトランスフェロソーム（ｔｒａｎｓｆｅｒｏｓｏｍｅ）を用いて経皮送達を達成することができる。Paul et al., Eur. J. Immunol. 25: 3521-24, 1995およびCevc et al., Biochem. Biophys. Acta 1368: 201-15, 1998を参照されたい。 Transdermal administration includes transdermal absorption of the composition through the skin. Transdermal preparations include patches, ointments, creams, gels, salves, and the like. Transdermal delivery can be achieved using skin patches or transferosomes. See Paul et al., Eur. J. Immunol. 25: 3521-24, 1995 and Cevc et al., Biochem. Biophys. Acta 1368: 201-15, 1998.

「治療する」または「治療」は、疾患の症状、合併症または生化学的兆候の発症を予防または遅らせること、その症状を軽減または改善すること、または疾患、病気または障害のさらなる進行を阻止または阻害することを含む。「患者」は治療を必要としているヒトの対象である。 "Treat" or "treatment" means to prevent or delay the onset of symptoms, complications or biochemical signs of a disease, to reduce or ameliorate its symptoms, or to prevent or prevent further progression of a disease, disease or disorder. Including inhibiting. A "patient" is a human subject in need of treatment.

「有効量」とは、疾患の１つ以上の症状を改善し、かつ／または疾患の進行を予防し、疾患の回復を引き起こし、かつ／または所望の効果を達成するのに十分な治療薬の量を指す。 "Effective amount" means sufficient amount of therapeutic agent to ameliorate one or more symptoms of a disease, and/or prevent progression of the disease, cause resolution of the disease, and/or achieve the desired effect. Refers to quantity.

コンピュータシステム
本明細書に記載されている各種態様および機能は、１つ以上のコンピュータシステムにおいて実行される専用ハードウェアまたはソフトウェア構成要素として実装してもよい。現在使用されているコンピュータシステムの多くの例がある。これらの例としては、とりわけ、ネットアプライアンス、パーソナルコンピュータ、ワークステーション、メインフレーム、ネットワーク化されたクライアント、サーバ、メディアサーバ、アプリケーションサーバ、データベースサーバおよびウェブサーバが挙げられる。コンピュータシステムの他の例としては、携帯電話および携帯情報端末などのモバイルコンピューティングデバイス、ロードバランサー、ルータおよびスイッチなどのネットワーク機器を挙げることができる。さらに、態様は、単一のコンピュータシステム上に位置していてもよく、あるいは１つ以上の通信ネットワークによって接続された複数のコンピュータシステム間に分散されていてもよい。 Computer Systems Various aspects and functionality described herein may be implemented as dedicated hardware or software components running on one or more computer systems. There are many examples of computer systems currently in use. Examples of these include net appliances, personal computers, workstations, mainframes, networked clients, servers, media servers, application servers, database servers, and web servers, among others. Other examples of computer systems can include mobile computing devices such as cell phones and personal digital assistants, network equipment such as load balancers, routers and switches. Furthermore, aspects may be located on a single computer system or distributed among multiple computer systems connected by one or more communication networks.

例えば、各種態様、機能および方法は、１つ以上のクライアントコンピュータにサービスを提供するか、分散システムの一部としてタスク全体を行うように構成された１つ以上のコンピュータシステム間に分散されていてもよい。さらに、態様は、各種機能を行う１つ以上のサーバシステム間に分散された構成要素を備えるクライアントサーバすなわち多階層システム上で行われてもよい。従って、実施形態は、あらゆる特定のシステムまたはシステム群上での実行に限定されない。さらに、態様、機能および方法は、ソフトウェア、ハードウェアまたはファームウェアあるいは任意のそれらの組み合わせの中に実装されていてもよい。従って、態様、機能および方法は、様々なハードウェアおよびソフトウェア構成を用いて、方法、動作、システム、システム要素および構成要素内に実装されていてもよく、これらの例は、あらゆる特定の分散型アーキテクチャ、ネットワークまたは通信プロトコルに限定されない。 For example, the various aspects, functions, and methods may be distributed among one or more computer systems configured to provide services to one or more client computers or to perform tasks entirely as part of a distributed system. Good too. Additionally, aspects may be performed on a client-server or multi-tier system with components distributed among one or more server systems that perform various functions. Accordingly, embodiments are not limited to performance on any particular system or systems. Further, the aspects, functions and methods may be implemented in software, hardware or firmware or any combination thereof. Accordingly, the aspects, features and methods may be implemented in methods, operations, systems, system elements and components using a variety of hardware and software configurations; these examples include any particular distributed Not limited to architecture, network or communication protocols.

図１０を参照すると、各種態様および機能が実施される分散型コンピュータシステム３００のブロック図が示されている。図示のように、分散型コンピュータシステム３００は、情報を交換する１つ以上のコンピュータシステムを備える。より具体的には、分散型コンピュータシステム３００は、コンピュータシステム３０２、３０４および３０６を備える。図示のように、コンピュータシステム３０２、３０４および３０６は通信ネットワーク３０８を介して相互接続されており、通信ネットワーク３０８を介してデータ交換することができる。ネットワーク３０８は、それを介してコンピュータシステムがデータ交換することができる任意の通信ネットワークを備えていてもよい。ネットワーク３０８を用いてデータ交換するために、コンピュータシステム３０２、３０４および３０６ならびにネットワーク３０８は、各種方法、プロトコルおよび規格を使用してもよい。これらのプロトコルおよび規格の例としては、ビッグデータ環境で使用するのに適したＮＡＳ、Ｗｅｂ、記憶および他のデータ移動プロトコルが挙げられる。データ転送が安全であることを保証するために、コンピュータシステム３０２、３０４および３０６は、例えばＳＳＬまたはＶＰＮ技術などの様々なセキュリティ対策を用いてネットワーク３０８を介してデータを伝送してもよい。分散型コンピュータシステム３００は３つのネットワーク化されたコンピュータシステムを示しているが、分散型コンピュータシステム３００はそのように限定されず、任意の媒体および通信プロトコルを用いてネットワーク化された任意の数のコンピュータシステムおよびコンピューティングデバイスを備えていてもよい。 Referring to FIG. 10, a block diagram of a distributed computer system 300 in which various aspects and functions are implemented is shown. As illustrated, distributed computer system 300 includes one or more computer systems that exchange information. More specifically, distributed computer system 300 includes computer systems 302, 304, and 306. As illustrated, computer systems 302 , 304 , and 306 are interconnected via a communications network 308 and may exchange data via communications network 308 . Network 308 may include any communications network through which computer systems can exchange data. To exchange data using network 308, computer systems 302, 304, and 306 and network 308 may use various methods, protocols, and standards. Examples of these protocols and standards include NAS, Web, storage and other data movement protocols suitable for use in big data environments. To ensure that data transmissions are secure, computer systems 302, 304, and 306 may transmit data over network 308 using various security measures, such as, for example, SSL or VPN technology. Although distributed computer system 300 depicts three networked computer systems, distributed computer system 300 is not so limited and can include any number of networked computer systems using any medium and communication protocol. It may include a computer system and a computing device.

図１０に示すように、コンピュータシステム３０２は、プロセッサ３１０、メモリ３１２、相互接続要素３１４、インタフェース３１６およびデータ記憶要素３１８を備える。本明細書に開示されている態様、機能および方法の少なくともいくつかを実装するために、プロセッサ３１０は、処理されたデータが得られる一連の命令を行う。プロセッサ３１０は、任意の種類のプロセッサ、マルチプロセッサまたは制御装置であってもよい。プロセッサの例としては、Intel Xeon、Ｉｔａｎｉｕｍ、Ｃｏｒｅ、ＣｅｌｅｒｏｎまたはＰｅｎｔｉｕｍプロセッサなどの市販されているプロセッサ、AMD Opteronプロセッサ、Apple A4もしくはA5プロセッサ、Sun UltraSPARCプロセッサ、IBM Power5+プロセッサ、ＩＢＭメインフレームチップまたは量子コンピュータを挙げることができる。プロセッサ３１０は、相互接続要素３１４によって１つ以上のメモリ装置３１２を含む他のシステム構成要素に接続されている。 As shown in FIG. 10, computer system 302 includes a processor 310, a memory 312, an interconnect element 314, an interface 316, and a data storage element 318. To implement at least some of the aspects, features, and methods disclosed herein, processor 310 performs a series of instructions that result in processed data. Processor 310 may be any type of processor, multiprocessor, or controller. Examples of processors include commercially available processors such as Intel Xeon, Itanium, Core, Celeron or Pentium processors, AMD Opteron processors, Apple A4 or A5 processors, Sun UltraSPARC processors, IBM Power5+ processors, IBM mainframe chips or quantum computers. can be mentioned. Processor 310 is connected to other system components, including one or more memory devices 312, by interconnect elements 314.

メモリ３１２は、コンピュータシステム３０２の動作中にプログラム（例えば、プロセッサ３１０によって実行可能なようにコード化された一連の命令）およびデータを記憶する。従って、メモリ３１２は、ダイナミックＲＡＭ（「ＤＲＡＭ」）またはスタティックメモリ（「ＳＲＡＭ」）などの比較的高性能の揮発性ＲＡＭであってもよい。但し、メモリ３１２は、ディスクドライブまたは他の不揮発性記憶装置などのデータを記憶するための任意の装置を備えていてもよい。各種例により、本明細書に開示されている機能を行うようにメモリ３１２を、個別化され、かつ場合によっては固有の構造に組織化してもよい。これらのデータ構造は、特定のデータの値およびデータの種類を記憶するようにサイズ決めおよび組織化されていてもよい。 Memory 312 stores programs (eg, sequences of instructions coded for execution by processor 310) and data during operation of computer system 302. Accordingly, memory 312 may be relatively high performance volatile RAM, such as dynamic RAM ("DRAM") or static memory ("SRAM"). However, memory 312 may include any device for storing data, such as a disk drive or other non-volatile storage device. In various examples, memory 312 may be organized into individualized and possibly unique structures to perform the functions disclosed herein. These data structures may be sized and organized to store specific data values and data types.

コンピュータシステム３０２の構成要素は、相互接続要素３１４などの相互接続要素によって接続されている。相互接続要素３１４は、ＩＤＥ、ＳＣＳＩ、ＰＣＩおよびＩｎｆｉｎｉＢａｎｄなどの専用または標準的なコンピューティングバス技術に従う１つ以上の物理的バスなどのシステム構成要素間の任意の通信結合を含んでいてもよい。相互接続要素３１４により、命令およびデータを含む通信をコンピュータシステム３０２のシステム構成要素間で交換することが可能となる。 Components of computer system 302 are connected by interconnect elements, such as interconnect element 314. Interconnection element 314 may include any communication coupling between system components such as one or more physical buses following proprietary or standard computing bus technologies such as IDE, SCSI, PCI and InfiniBand. Interconnection element 314 allows communications, including instructions and data, to be exchanged between system components of computer system 302.

コンピュータシステム３０２は、入力装置、出力装置および入力／出力装置の組み合わせなどの１つ以上のインタフェース装置３１６も備える。インタフェース装置は、入力を受け取るか出力を与えてもよい。より詳細には、出力装置は、外部提示のために情報を与えてもよい。入力装置は外部ソースから情報を受け取ってもよい。インタフェース装置の例としては、キーボード、マウス装置、トラックボール、マイクロホン、タッチスクリーン、印刷装置、表示画面、スピーカー、ネットワークインタフェースカードなどが挙げられる。インタフェース装置によりコンピュータシステム３０２は情報を交換し、かつユーザおよび他のシステムなどの外部実体と通信することができる。 Computer system 302 also includes one or more interface devices 316, such as input devices, output devices, and combinations of input/output devices. An interface device may receive input or provide output. More particularly, the output device may provide information for external presentation. The input device may receive information from an external source. Examples of interface devices include keyboards, mouse devices, trackballs, microphones, touch screens, printing devices, display screens, speakers, network interface cards, and the like. Interface devices allow computer system 302 to exchange information and communicate with external entities such as users and other systems.

データ記憶要素３１８は、プロセッサ３１０によって実行されるプログラムまたは他のオブジェクトを定義する命令が記憶されているコンピュータによる読み取りおよび書込みが可能な不揮発性または非一時的データ記憶媒体を備える。また、データ記憶要素３１８は、媒体の上または中に記録され、かつプログラムの実行中にプロセッサ３１０によって処理される情報を含んでいてもよい。より具体的には、当該情報は、具体的には記憶スペースを節約するかデータ交換性能を高めるように構成された１つ以上のデータ構造に記憶されていてもよい。当該命令は、符号化された信号として恒久的に記憶されていてもよく、当該命令は、プロセッサ３１０に本明細書に記載されている機能を実行させてもよい。当該媒体は、例えば、とりわけ光ディスク、磁気ディスクまたはフラッシュメモリであってもよい。動作中、プロセッサ３１０またはいくつかの他の制御装置は、データを不揮発性記録媒体から、データ記憶要素３１８に含まれている記憶媒体よりもプロセッサ３１０による情報へのより速いアクセスを可能にするメモリ３１２などの別のメモリに読み込ませる。但し、当該メモリはデータ記憶要素３１８またはメモリ３１２内に位置していてもよく、プロセッサ３１０はメモリ内のデータを処理し、次いで、処理が完了した後にデータをデータ記憶要素３１８に関連する記憶媒体にコピーする。様々な構成要素は、記憶媒体と他のメモリ要素との間でのデータ移動を管理してもよく、これらの例は特定のデータ管理構成要素に限定されない。さらに、これらの例は特定のメモリシステムまたはデータ記憶システムに限定されない。 Data storage element 318 comprises a computer readable and writable non-volatile or non-transitory data storage medium on which instructions defining a program or other object to be executed by processor 310 are stored. Data storage element 318 may also contain information recorded on or in a medium and processed by processor 310 during execution of the program. More specifically, the information may be stored in one or more data structures specifically configured to save storage space or enhance data exchange performance. The instructions may be permanently stored as encoded signals and may cause processor 310 to perform the functions described herein. The medium may be, for example, an optical disk, a magnetic disk or a flash memory, among others. In operation, processor 310 or some other controller stores data from a non-volatile storage medium to a memory that allows faster access to the information by processor 310 than the storage medium contained in data storage element 318. 312 or other memory. However, the memory may be located within data storage element 318 or memory 312, and processor 310 processes the data in memory and then transfers the data to a storage medium associated with data storage element 318 after the processing is complete. Copy to. Various components may manage data movement between storage media and other memory elements, and these examples are not limited to particular data management components. Furthermore, these examples are not limited to particular memory or data storage systems.

コンピュータシステム３０２は、例えば各種態様および機能を実施することができるコンピュータシステムの一種として示されているが、態様および機能は、図１０に示すコンピュータシステム３０２上での実行に限定されない。各種態様および機能は、図１０に示すものとは異なるアーキテクチャまたは構成要素を有する１つ以上のコンピュータ上で実施することができる。例えば、コンピュータシステム３０２は、本明細書に開示されている特定の動作を行うように作られた特定用途向け集積回路（「ＡＳＩＣ」）などの特別にプログラムされた特殊な用途のハードウェアを備えていてもよい。一方、別の例は、Motorola PowerPCプロセッサを備えたMAC OSシステムＸを実行するいくつかの汎用コンピューティングデバイスおよび独自のハードウェアおよびオペレーティングシステムを実行するいくつかの専用コンピューティングデバイスのグリッドを用いて同じ機能を行うことができる。 Although computer system 302 is illustrated as one type of computer system capable of implementing various aspects and functions, the aspects and functions are not limited to implementation on computer system 302 shown in FIG. 10. Various aspects and functions may be implemented on one or more computers having a different architecture or components than that shown in FIG. For example, computer system 302 may include specially programmed, special-purpose hardware, such as an application-specific integrated circuit ("ASIC"), designed to perform the specific operations disclosed herein. You can leave it there. On the other hand, another example is with a grid of some general purpose computing devices running MAC OS System X with Motorola PowerPC processors and some specialized computing devices running their own hardware and operating systems. can perform the same function.

コンピュータシステム３０２は、コンピュータシステム３０２に含まれるハードウェア要素の少なくとも一部を管理するオペレーティングシステムを備えるコンピュータシステムであってもよい。いくつかの例では、プロセッサ３１０などのプロセッサまたは制御装置は、オペレーティングシステムを実行する。実行することができる特定のオペレーティングシステムの例としては、Ｍｉｃｒｏｓｏｆｔ社から入手可能なWindows NT、Windows 2000 (Windows ME)、Windows XP、Windows VistaまたはWindows 7オペレーティングシステムなどのＷｉｎｄｏｗｓ系オペレーティングシステム、Apple Computerから入手可能なMAC OS System XオペレーティングシステムまたはｉＯＳオペレーティングシステム、多くのＬｉｎｕｘ（登録商標）系オペレーティングシステムディストリビューションのうちの１つ、例えば、Red Hat社から入手可能なEnterprise Linux（登録商標）オペレーティングシステム、Ｏｒａｃｌｅ社から入手可能なＳｏｌａｒｉｓオペレーティングシステムまたは各種提供源から入手可能なＵＮＩＸ（登録商標）オペレーティングシステムが挙げられる。多くの他のオペレーティングシステムを使用してもよく、これらの例は任意の特定のオペレーティングシステムに限定されない。 Computer system 302 may be a computer system that includes an operating system that manages at least some of the hardware elements included in computer system 302. In some examples, a processor or controller, such as processor 310, executes an operating system. Examples of specific operating systems that may run include Windows-based operating systems such as Windows NT, Windows 2000 (Windows ME), Windows XP, Windows Vista, or Windows 7 operating systems available from Microsoft; the available MAC OS System Examples include the Solaris operating system available from Oracle Corporation or the UNIX operating system available from various sources. Many other operating systems may be used, and these examples are not limited to any particular operating system.

プロセッサ３１０およびオペレーティングシステムは一緒に、高レベルプログラミング言語でアプリケーションプログラムが記載されているコンピュータプラットフォームを画定する。これらのコンポーネントアプリケーションは、通信プロトコル、例えばＴＣＰ／ＩＰを用いて通信ネットワーク、例えばインターネット上で通信する、実行可能な中間バイトコードすなわち解釈実行されるコードであってもよい。同様に、態様は、．Ｎｅｔ、ＳｍａｌｌＴａｌｋ、Ｊａｖａ（登録商標）、Ｃ＋＋、Ａｄａ、Ｃ＃（Ｃシャープ）、ＰｙｔｈｏｎまたはＪａｖａＳｃｒｉｐｔ（登録商標）などのオブジェクト指向プログラミング言語を用いて実装されていてもよい。他のオブジェクト指向プログラミング言語も使用してもよい。あるいは、関数、スクリプトまたは論理プログラミング言語を使用してもよい。 Processor 310 and operating system together define a computer platform on which application programs are written in high-level programming languages. These component applications may be executable intermediate bytecodes or interpreted code that communicate over a communication network, eg the Internet, using a communication protocol, eg TCP/IP. Similarly, the aspect is . NET, SmallTalk, Java(R), C++, Ada, C# (C Sharp), Python, or JavaScript(R). Other object oriented programming languages may also be used. Alternatively, functions, scripts or logic programming languages may be used.

さらに、各種態様および機能は、プログラムされていない環境に実装されていてもよい。例えば、ＨＴＭＬ、ＸＭＬまたは他のフォーマットで作成された文書は、ブラウザプログラムのウィンドウで見た場合、グラフィカルユーザインタフェースの態様を与えるか他の機能を行うことができる。さらに、各種例は、プログラムされた要素またはプログラムされていない要素または任意のそれらの組み合わせとして実装されていてもよい。例えば、ウェブページはＨＴＭＬを用いて実装されていてもよいが、ウェブページ内で必要なデータオブジェクトはＣ＋＋で書き込まれていてもよい。従って、その例は、特定のプログラミング言語に限定されず、あらゆる好適なプログラミング言語を使用することができる。従って、本明細書に開示されている機能的構成要素としては、本明細書に記載されている機能を行うように構成された多種多様な要素（例えば、専用ハードウェア、実行可能コード、データ構造またはオブジェクト）を挙げることができる。 Furthermore, various aspects and functionality may be implemented in an unprogrammed environment. For example, a document created in HTML, XML, or other formats may provide aspects of a graphical user interface or perform other functions when viewed in a browser program window. Furthermore, various examples may be implemented as programmed or unprogrammed elements or any combination thereof. For example, a web page may be implemented using HTML, but data objects required within the web page may be written in C++. Therefore, the example is not limited to a particular programming language, but any suitable programming language may be used. Accordingly, functional components disclosed herein include a wide variety of elements configured to perform the functions described herein (e.g., specialized hardware, executable code, data structures, etc.). or objects).

いくつかの例では、本明細書に開示されている構成要素は、当該構成要素によって行われる機能に影響を与えるパラメータを読み出してもよい。これらのパラメータは、揮発性メモリ（ＲＡＭなど）または不揮発性メモリ（磁気ハードドライブなど）を含む任意の形態の好適なメモリに物理的に記憶されていてもよい。また、当該パラメータは、独自のデータ構造（ユーザスペースアプリケーションによって定義されたデータベースまたはファイルなど）または一般に共有されるデータ構造（オペレーティングシステムによって定義されているアプリケーションレジストリなど）に論理的に記憶されていてもよい。また、いくつかの例は、外部実体にパラメータを修正させ、それにより当該構成要素の動作を構成させるシステムおよびユーザインタフェースの両方を提供する。 In some examples, components disclosed herein may read parameters that affect the functions performed by the component. These parameters may be physically stored in any form of suitable memory, including volatile memory (such as RAM) or non-volatile memory (such as a magnetic hard drive). Additionally, such parameters may be logically stored in proprietary data structures (such as a database or file defined by a user-space application) or commonly shared data structures (such as an application registry defined by an operating system). Good too. Additionally, some examples provide both a system and a user interface that allow external entities to modify parameters and thereby configure the behavior of the component.

計算方法を行うためのソフトウェアは全体として図１１Ａに示されており、ここでは、ユーザは、本明細書において先に記載したように、コンピュータ上で手順を実行するための動作パラメータを選択し（４０２）、ここで、１つ以上の配列を入力し（４０４）、かつ置換を行う（４０８）。本システムは、二次構造を確認し（４０８）、かつ１種以上の変異体の水溶性を確認するように動作可能である。図１１Ｂに示すように、当該プログラムは、先に記載したものに加えてさらなる処理オプションを含むことができ、ここでは１つ以上のランク付け関数を記憶することができ（４４２）、ユーザは使用するランク付け関数を選択するか本システムが自動的に選択することができる（４４４）。次いで、本システムは、本明細書に記載されているようにランクを生成し（４４６）、次いで、ユーザは選択された変異体を産生して（４４８）機能を測定し（４４８）、その後に機能データを入力して、それに基づいて処理手順を修正する（４５０）ことができる。 The software for performing the computational method is shown generally in FIG. 11A, where the user selects operating parameters for performing the procedure on the computer ( 402), where one or more sequences are input (404) and substitutions are made (408). The system is operable to confirm 408 secondary structure and confirm water solubility of the one or more variants. As shown in FIG. 11B, the program can include further processing options in addition to those described above, where one or more ranking functions can be stored (442) and used by the user. The system may automatically select a ranking function to perform (444). The system then generates a rank (446) as described herein, and the user then generates (448) the selected variant and measures function (448), after which Functional data may be entered and the procedure modified 450 based thereon.

本発明は、例示としてのみ意図されており本発明の範囲を限定するものではない以下の実施例との関連でより理解されるであろう。開示されている実施形態への各種変更および修正は当業者には明らかであり、本発明の趣旨および添付の特許請求の範囲から逸脱することなく、そのような変更を行うことができる。 The invention will be better understood in connection with the following examples, which are intended by way of illustration only and are not intended to limit the scope of the invention. Various changes and modifications to the disclosed embodiments will be apparent to those skilled in the art and may be made without departing from the spirit of the invention and the scope of the appended claims.

実施例１：ＣＸＣケモカイン受容体タイプ４イソ型ａ（ＣＸＣＲ４）
ＣＸＣＲ４は３５６アミノ酸長のケモカイン受容体である。これは約８．６１のｐＩおよび４０２２１．１９Ｄａの分子量を有する。文献に発表されているＣＸＣＲ４の配列は、

である。 Example 1: CXC chemokine receptor type 4 isoform a (CXCR4)
CXCR4 is a 356 amino acid long chemokine receptor. It has a pI of approximately 8.61 and a molecular weight of 40221.19 Da. The sequence of CXCR4 published in the literature is

It is.

この配列をＴＭＨＭＭに供して、図３に示されている膜貫通ドメインを同定する。 This sequence is subjected to TMHMM to identify the transmembrane domain shown in FIG.

疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して以下の配列を得る。

All or substantially all of the hydrophobic amino acids L, I, V and F are replaced with Q, T and Y (respectively) to yield the following sequence.

このタンパク質の予測されるｐＩは８．５４であり、分子量は４０５５１．６４Ｄａである。予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、配列番号２のアミノ酸４７～７０を含む膜貫通ドメイン（ＴＭ１）およびそれを含むタンパク質を含む。一例として、図３はＴＭ１配列のαヘリックス予測を表す。好ましくは本明細書中のＴＭ１を含むタンパク質は、配列番号２の細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、配列番号２または、配列番号１に記載されている天然Ｌ、Ｉ、ＶおよびＦアミノ酸の１つ、２つ、３つまたは場合により４つまたはそれ以上を保持する相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 The predicted pI of this protein is 8.54 and the molecular weight is 40551.64 Da. Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes a transmembrane domain (TM1) comprising amino acids 47-70 of SEQ ID NO: 2 and a protein comprising the same. As an example, Figure 3 represents an α-helix prediction of the TM1 sequence. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of the extracellular and intracellular loop sequences (sequences not underlined) of SEQ ID NO: 2. Additionally or alternatively, proteins comprising TM1 herein include SEQ ID NO: 2 or one, two, three or optionally of the natural L, I, V and F amino acids set forth in SEQ ID NO: 1. Contains one or more additional transmembrane regions (sequences underlined) within homologous sequences retaining four or more.

ＣＸＣＲ４の天然タンパク質配列（Ｎ末端アミノ酸において異なる）を再度本方法に供する。プログラム出力は天然配列を細胞外および細胞内領域に分け、膜貫通ドメインのそれぞれに対して８種の膜貫通ドメイン変異体を選択した。その結果を図４および以下の表に示す。

MEGISIYTSDNYTEEMGSGDYDSMKEPCFREENANFNK (配列番号3; EC1)

ＴＭ１変異体：
IFLPTTYSTTFQTGTTGNGQVTQVM (配列番号4)
IFQPTTYSTTFQTGTTGNGQVTQVM (配列番号5)
IFQPTTYSTTFQTGTTGNGQVTQTM (配列番号6)
IFQPTTYSTTYQTGTTGNGQVTQTM (配列番号7)
IFQPTTYSTTYQTGTTGNGQTTQVM (配列番号8)
IFQPTTYSTTYQTGTTGNGQTIQTM (配列番号9)
IFQPTTYSTTYQTGTTGNGQTTQTM (配列番号10)
TYQPTTYSTTYQTGTTGNGQTTQTM (配列番号11)

GYQKKLRSMTDKYR (配列番号12; IC1)

ＴＭ２変異体：
LHLSTADQQFTTTQPFWAVDAV (配列番号13)
LHLSVADQQYTTTQPFWATDAV (配列番号14)
LHQSVADQQYVTTQPFWATDAT (配列番号15)
QHQSVADQQFTTTQPFWATDAT (配列番号16)
LHQSVADQQYTITQPYWATDAT (配列番号17)
QHLSVADQQYTITQPYWATDAT (配列番号18)
QHLSTADQQYVTTQPYWATDAT (配列番号19)
QHQSTADQQYTTTQPYWATDAT (配列番号20)

ANWYFGNFLCK (配列番号21; EC2)

ＴＭ３変異体：
AVHVTYTVNQYSSVQIQAFT (配列番号22)
AVHTTYTVNQYSSVQIQAFT (配列番号23)
AVHTTYTVNQYSSVQTQAFT (配列番号24)
ATHTTYTVNQYSSVQTQAFT (配列番号25)
ATHTIYTTNQYSSVQTQAFT (配列番号26)
AVHTTYTTNQYSSVQTQAFT (配列番号27)
ATHTTYTTNQYSSVQTQAFT (配列番号28)
ATHTTYTTNQYSSTQTQAYT (配列番号29)

SLDRYLAIVHATNSQRPRKLLAEK (配列番号30; IC2)

ＴＭ４変異体：
VTYTGVWTPAQQQTIPDFIF (配列番号31)
TTYTGTWIPAQQQTIPDFIF (配列番号32)
TTYTGTWTPAQQQTIPDFIF (配列番号33)
TTYTGTWTPAQQQTIPDFIY (配列番号34)
TTYVGTWTPAQQQTTPDYIF (配列番号35)
TTYVGTWTPAQQQTTPDFIY (配列番号36)
TTYTGVWTPAQQQTTPDYTF (配列番号37)
TTYTGTWTPAQQQTTPDYTY (配列番号38)

ANVSEADDRYICDRFYPNDLW (配列番号39; EC3)

ＴＭ５変異体：
VVVFQFQHTMVGQTQPGTTTQ (配列番号40)
VVVFQFQHTMTGQTQPGTTTQ (配列番号41)
VVVFQYQHTMTGQTQPGTTTQ (配列番号42)
VVVYQYQHTMTGQTQPGTTTQ (配列番号43)
TVVFQYQHTMTGQTQPGTTTQ (配列番号44)
VVTFQYQHTMTGQTQPGTTTQ (配列番号45)
TVVYQYQHTMTGQTQPGTTTQ (配列番号46)
TTTYQYQHTMTGQTQPGTTTQ (配列番号47)

SCYCIIISKLSHSKGHQKRKALKTT (配列番号48; IC3)

ＴＭ６変異体：
VTQIQAFFACWQPYYTGTST (配列番号49)
VIQIQAYFACWQPYYTGTST (配列番号50)
VIQIQAYYACWQPYYTGTST (配列番号51)
VIQTQAFYACWQPYYTGTST (配列番号52)
VIQTQAYFACWQPYYTGTST (配列番号53)
VTQIQAFYACWQPYYTGTST (配列番号54)
VIQTQAYYACWQPYYTGTST (配列番号55)
TTQTQAYYACWQPYYTGTST (配列番号56)

DSFILLEIIKQGCEFENTVHK (配列番号57; EC4)

ＴＭ７変異体
WISITEAQAFFHCCLNPIQY (配列番号58)
WISITEAQAFYHCCLNPIQY (配列番号59)
WISITEAQAYFHCCQNPTLY (配列番号60)
WISTTEALAFYHCCQNPTQY (配列番号61)
WISTTEALAYFHCCQNPTQY (配列番号62)
WISITEALAYYHCCQNPTQY (配列番号63)
WISTTEALAYYHCCQNPTQY (配列番号64)
WTSTTEAQAYYHCCQNPTQY

AFLGAKFKTSAQHALTSVSRGSSLKILSKGKRGGHSSVSTESESSSFHSS (配列番号65; IC4)
The native protein sequence of CXCR4 (differing in the N-terminal amino acid) is again subjected to the method. The program output divided the native sequence into extracellular and intracellular regions and selected eight transmembrane domain variants for each transmembrane domain. The results are shown in FIG. 4 and the table below.

MEGISIYTSDNYTEEMGSGDYDSMKEPCFREENANFNK (SEQ ID NO: 3; EC1)

TM1 mutant:
IFLPTTYSTTFQTGTTGNGQVTQVM (SEQ ID NO: 4)
IFQPTTYSTTFQTGTTGNGQVTQVM (array number 5)
IFQPTTYSTTFQTGTTGNGQVTQTM (Sequence number 6)
IFQPTTYSTTYQTGTTGNGQVTQTM (Sequence number 7)
IFQPTTYSTTYQTGTTGNGQTTQVM (Sequence number 8)
IFQPTTYSTTYQTGTTGNGQTIQTM (SEQ ID NO: 9)
IFQPTTYSTTYQTGTTGNGQTTQTM (SEQ ID NO: 10)
TYQPTTYSTTYQTGTTGNGQTTQTM (SEQ ID NO: 11)

GYQKKLRSMTDKYR (SEQ ID NO: 12; IC1)

TM2 variant:
LHLSTADQQFTTTQPFWAVDAV (SEQ ID NO: 13)
LHLSVADQQYTTTQPFWATDAV (SEQ ID NO: 14)
LHQSVADQQYVTTQPFWATDAT (SEQ ID NO: 15)
QHQSVADQQFTTTQPFWATDAT (SEQ ID NO: 16)
LHQSVADQQYTITQPYWATDAT (SEQ ID NO: 17)
QHLSVADQQYTITQPYWATDAT (SEQ ID NO: 18)
QHLSTADQQYVTTQPYWATDAT (SEQ ID NO: 19)
QHQSTADQQYTTTQPYWATDAT (SEQ ID NO: 20)

ANWYFGNFLCK (SEQ ID NO: 21; EC2)

TM3 mutant:
AVHVTYTVNQYSSVQIQAFT (SEQ ID NO: 22)
AVHTTYTVNQYSSVQIQAFT (Sequence number 23)
AVHTTYTVNQYSSVQTQAFT (Sequence number 24)
ATHTTYTVNQYSSVQTQAFT (SEQ ID NO: 25)
ATHTIYTTNQYSSVQTQAFT (SEQ ID NO: 26)
AVHTTYTTNQYSSVQTQAFT (SEQ ID NO: 27)
ATHTTYTTNQYSSVQTQAFT (SEQ ID NO: 28)
ATHTTYTTNQYSSTQTQAYT (SEQ ID NO: 29)

SLDRYLAIVHATNSQRPRKLLAEK (SEQ ID NO: 30; IC2)

TM4 mutant:
VTYTGVWTPAQQQTIPDFIF (SEQ ID NO: 31)
TTYTGTWIPAQQQTIPDFIF (SEQ ID NO: 32)
TTYTGTWTPAQQQTIPDFIF (SEQ ID NO: 33)
TTYTGTWTPAQQQTIPDFIY (SEQ ID NO: 34)
TTYVGTWTPAQQQTTPDYIF (SEQ ID NO: 35)
TTYVGTWTPAQQQTTPDFIY (SEQ ID NO: 36)
TTYTGVWTPAQQQTTPDYTF (SEQ ID NO: 37)
TTYTGTWTPAQQQTTPDYTY (SEQ ID NO: 38)

ANVSEADDRYICDRFYPNDLW (SEQ ID NO: 39; EC3)

TM5 mutant:
VVVFQFQHTMVGQTQPGTTTQ (SEQ ID NO: 40)
VVVFQFQHTMTGQTQPGTTTQ (SEQ ID NO: 41)
VVVFQYQHTMTGQTQPGTTTQ (SEQ ID NO: 42)
VVVYQYQHTMTGQTQPGTTTQ (SEQ ID NO: 43)
TVVFQYQHTMTGQTQPGTTTQ (SEQ ID NO: 44)
VVTFQYQHTMTGQTQPGTTTQ (SEQ ID NO: 45)
TVVYQYQHTMTGQTQPGTTTQ (SEQ ID NO: 46)
TTTYQYQHTMTGQTQPGTTTQ (Sequence number 47)

SCYCIIISKLSHSKGHQKRKALKTT (SEQ ID NO: 48; IC3)

TM6 mutant:
VTQIQAFFACWQPYYTGTST (array number 49)
VIQIQAYFACWQPYYTGTST (SEQ ID NO: 50)
VIQIQAYYACWQPYYTGTST (SEQ ID NO: 51)
VIQTQAFYACWQPYYTGTST (SEQ ID NO: 52)
VIQTQAYFACWQPYYTGTST (SEQ ID NO: 53)
VTQIQAFYACWQPYYTGTST (Sequence number 54)
VIQTQAYYACWQPYYTGTST (SEQ ID NO: 55)
TTQTQAYYACWQPYYTGTST (SEQ ID NO: 56)

DSFILLEIIKQGCEFENTVHK (SEQ ID NO: 57; EC4)

TM7 mutant
WISITEAQAFFHCCLNPIQY (SEQ ID NO: 58)
WISITEAQAFYHCCLNPIQY (SEQ ID NO: 59)
WISITEAQAYFHCCQNPTLY (SEQ ID NO: 60)
WISTTEALAFYHCCQNPTQY (SEQ ID NO: 61)
WISTTEALAYFHCCQNPTQY (SEQ ID NO: 62)
WISITEALAYYHCCQNPTQY (SEQ ID NO: 63)
WISTTEALAYYHCCQNPTQY (SEQ ID NO: 64)
WTSTTEAQAYYHCCQNPTQY

AFLGAKFKTSAQHALTSVSRGSLKILSKGKRGGHSSVSTESESSSFHSS (SEQ ID NO: 65; IC4)

膜貫通ドメイン変異体の各リストの前、間および後の配列（配列番号３、１２、２１、３０、３９、４８、５７および６５）はそれぞれ、Ｎ’、中間およびＣ’細胞外および細胞内領域であることは上記から明らかであると思われる。 The sequences before, between and after each list of transmembrane domain variants (SEQ ID NOs: 3, 12, 21, 30, 39, 48, 57 and 65) are N', middle and C' extracellular and intracellular, respectively. It seems clear from the above that this is an area.

次いで当該技術分野で知られているように、上記配列を使用して発現系、この場合は酵母における発現に適したコード配列を生成した。次いで、このコード配列を組み替えて発現させ、それぞれが各変異体リスト内の１種の膜貫通ドメイン変異体をそれぞれの細胞内および細胞外ドメインの間に含む、配列番号３、１２、２１、３０、３９、４８、５７および６５を有する複数のタンパク質を含むライブラリーを作製した。 The above sequences were then used to generate coding sequences suitable for expression in an expression system, in this case yeast, as is known in the art. This coding sequence is then recombined and expressed to produce SEQ ID NOs: 3, 12, 21, 30, each containing one transmembrane domain variant within each variant list between the respective intracellular and extracellular domains. , 39, 48, 57, and 65.

次いで、そのように作製したライブラリーを、生きている酵母細胞内で結合する酵母において発現されたプラスミド上でＣＸＣＲ４同族リガンドすなわちＳＤＦ１ａ（またはＣＣＬ１２）についてアッセイした。酵母ツーハイブリッドシステムによる遺伝子活性化によりリガンド結合を検出し、次いで試料を配列決定した。１９種のＣＸＣＲ４変異体を配列決定した。その結果を図５に示す。 The library so produced was then assayed for the CXCR4 cognate ligand, SDF1a (or CCL12), on a plasmid expressed in yeast that binds in living yeast cells. Ligand binding was detected by gene activation with a yeast two-hybrid system, and samples were then sequenced. Nineteen CXCR4 variants were sequenced. The results are shown in FIG.

実施例２：ＣＸＣケモカイン受容体タイプ３イソ型ｂ（ＣＸ３ＣＲ１）
ＣＸ３ＣＲ１は３５５アミノ酸長のケモカイン受容体である。これは約６．７４のｐＩおよび４０３９６．４Ｄａの分子量を有する。この配列をＴＭＨＭＭに供してその膜貫通ドメインを同定する。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線）に整列させた以下の配列（下側の線）を得る。

Example 2: CXC chemokine receptor type 3 isoform b (CX3CR1)
CX3CR1 is a 355 amino acid long chemokine receptor. It has a pI of approximately 6.74 and a molecular weight of 40396.4 Da. This sequence is subjected to TMHMM to identify its transmembrane domain. All or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain were replaced with Q, T and Y (respectively) to align with the wild type (top line). Get the array (bottom line).

このタンパク質変異体の予測されるｐＩは６．７４であり、分子量は４１０２７．１７Ｄａである。予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、配列番号６７の下線が引かれているアミノ酸を含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、配列番号６６の細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、配列番号６７または、配列番号６６に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 The predicted pI of this protein variant is 6.74 and the molecular weight is 41027.17 Da. Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes a transmembrane domain comprising the underlined amino acids of SEQ ID NO:67. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of the extracellular and intracellular loop sequences (sequences not underlined) of SEQ ID NO: 66. Additionally or alternatively, proteins comprising TM1 herein include SEQ ID NO: 67 or one, two, three of the naturally occurring V, L, I and F amino acids set forth in SEQ ID NO: 66, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more.

ＣＸ３ＣＲ１の天然タンパク質配列を再度本方法に供する。プログラム出力は天然配列を細胞外および細胞内領域に分け、膜貫通ドメインのそれぞれに対して８種の膜貫通ドメイン変異体を選択した。その結果を以下の表に示す。

MDQFPESVTENFEYDDLAEACYIGDIVVFGT (配列番号68)

ＴＭ１変異体：
TYQSTYYSTTFATGQVGNQQVVFALTNS (配列番号69)
TYQSTYYSTTYATGQVGNQQVVFALTNS (配列番号70)
TYQSTYYSTTYATGQVGNQQVVFAQTNS (配列番号71)
TYQSTYYSTTYATGQTGNLQVTFAQTNS (配列番号72)
TYQSTYYSTTYATGQTGNQLVTFAQTNS (配列番号73)
TYQSTYYSTTYATGQTGNQQVVFAQTNS (配列番号74)
TYQSTYYSTTYATGQTGNLQVTYAQTNS (配列番号75)
TYQSTYYSTTYATGQTGNQQTTYAQTNS (配列番号76)

KKPKSVTDIY (配列番号77)

ＴＭ２変異体
LLNQAQSDQLFVATQPFWTHY (配列番号78)
LLNQAQSDQQFVATQPFWTHY (配列番号79)
QQNLAQSDQQFVATQPFWTHY (配列番号80)
LQNLAQSDQQYTATQPFWTHY (配列番号81)
QLNLAQSDQQYTATQPFWTHY (配列番号82)
LLNQAQSDQQFTATQPYWTHY (配列番号83)
QQNLAQSDQQFTATQPYWTHY (配列番号84)
QQNQAQSDQQYTATQPYWTHY (配列番号85)

LINEKGLHNAMCK (配列番号86)

ＴＭ３変異体：
YTTAYYYTGYYGSTYYTTTTST (配列番号87)

DRYLAIVLAANSMNNRT (配列番号88)

ＴＭ４変異体：
VQHGTTTSQGTWAAATQVAAPQFMF (配列番号89)
VQHGVTTSQGTWAAATQTAAPQFMF (配列番号90)
VQHGTTTSQGVWAAATQTAAPQFMY (配列番号91)
VQHGTTTSQGTWAAAIQTAAPQFMY (配列番号92)
VQHGTTTSQGTWAAATQTAAPQFMF (配列番号93)
VQHGTTISQGTWAAATQTAAPQYMF (配列番号94)
VQHGTTTSQGTWAAATQTAAPQFMY (配列番号95)
TQHGTTTSQGTWAAATQTAAPQYMY (配列番号96)

TKQKENECLGDYPEVLQEIWPVLRNVET (配列番号97)

ＴＭ５変異体：
NFLGFQQPQQIMSYCYFRIT (配列番号98)
NFQGFLQPQQTMSYCYFRIT (配列番号99)
NFQGFLQPQQTMSYCYFRTT (配列番号100)
NFQGFQQPQQTMSYCYYRIT (配列番号101)
NFQGFLQPQQTMSYCYYRTT (配列番号102)
NFQGYLQPQQTMSYCYFRTT (配列番号103)
NYQGFQQPQQTMSYCYFRTT (配列番号104)
NYQGYQQPQQTMSYCYYRTT (配列番号105)

QTLFSCKNHKKAKAIK (配列番号106)

ＴＭ６変異体：
LIQQTTTTFYQFWTPYNTMTFQETL (配列番号107)
LIQQTTTTFYQYWTPYNVMTFQETQ (配列番号108)
LIQQTTTTYYQFWTPYNTMTFQETQ (配列番号109)
QIQQTTTTFYQYWTPYNTMTFQETQ (配列番号110)
LTQQTTTTYYQFWTPYNTMTFQETQ (配列番号111)
QIQQTTTTFFQYWTPYNTMTYQETQ (配列番号112)
QIQQTTTTFYQYWTPYNTMTYQETQ (配列番号113)
QTQQTTTTYYQYWTPYNTMTYQETQ (配列番号114)

KLYDFFPSCDMRKDLRL (配列番号115)

ＴＭ７変異体：
ALSVTETVAFSHCCQNPQIYAFAG (配列番号116)
AQSVTETTAFSHCCQNPLIYAFAG (配列番号117)
ALSVTETVAFSHCCQNPQTYAYAG (配列番号118)
AQSVTETTAFSHCCQNPQIYAYAG (配列番号119)
ALSVTETTAFSHCCQNPQTYAYAG (配列番号120)
ALSTTETTAYSHCCQNPQIYAFAG (配列番号121)
ALSVTETTAYSHCCQNPQTYAYAG (配列番号122)
AQSTTETTAYSHCCQNPQTYAYAG (配列番号123)

EKFRRYLYHLYGKCLAVLCGRSVHVDFSSSESQRSRHGSVLSSNFTYHTSDGDALLLL (配列番号124)
The native protein sequence of CX3CR1 is again subjected to this method. The program output divided the native sequence into extracellular and intracellular regions and selected eight transmembrane domain variants for each transmembrane domain. The results are shown in the table below.

MDQFPESVTENFEYDDLAEACYIGDIVVFGT (SEQ ID NO: 68)

TM1 mutant:
TYQSTYYSTTFATGQVGNQQVVFALTNS (SEQ ID NO: 69)
TYQSTYYSTTYATGQVGNQQVVFALTNS (Sequence number 70)
TYQSTYYSTTYATGQVGNQQVVFAQTNS (Sequence number 71)
TYQSTYYSTTYATGQTGNLQVTFAQTNS (SEQ ID NO: 72)
TYQSTYYSTTYATGQTGNQLVTFAQTNS (SEQ ID NO: 73)
TYQSTYYSTTYATGQTGNQQVVFAQTNS (SEQ ID NO: 74)
TYQSTYYSTTYATGQTGNLQVTYAQTNS (SEQ ID NO: 75)
TYQSTYYSTTYATGQTGNQQTTYAQTNS (SEQ ID NO: 76)

KKPKSVTDIY (SEQ ID NO: 77)

TM2 mutant
LLNQAQSDQLFVATQPFWTHY (Sequence number 78)
LLNQAQSDQQFVATQPFWTHY (Sequence number 79)
QQNLAQSDQQFVATQPFWTHY (Sequence number 80)
LQNLAQSDQQYTATQPFWTHY (SEQ ID NO: 81)
QLNLAQSDQQYTATQPFWTHY (SEQ ID NO: 82)
LLNQAQSDQQFTATQPYWTHY (SEQ ID NO: 83)
QQNLAQSDQQFTATQPYWTHY (SEQ ID NO: 84)
QQNQAQSDQQYTATQPYWTHY (SEQ ID NO: 85)

LINEKGLHNAMCK (SEQ ID NO: 86)

TM3 mutant:
YTTAYYYTGYYGSTYYTTTTST (SEQ ID NO: 87)

DRYLAIVLAANSMNNRT (SEQ ID NO: 88)

TM4 mutant:
VQHGTTTSQGTWAAATQVAAPQFMF (SEQ ID NO: 89)
VQHGVTTSQGTWAAATQTAAPQFMF (Sequence number 90)
VQHGTTTSQGVWAAATQTAAPQFMY (SEQ ID NO: 91)
VQHGTTTSQGTWAAAIQTAAPQFMY (SEQ ID NO: 92)
VQHGTTTSQGTWAAATQTAAPQFMF (SEQ ID NO: 93)
VQHGTTISQGTWAAATQTAAPQYMF (SEQ ID NO: 94)
VQHGTTTSQGTWAAATQTAAPQFMY (SEQ ID NO: 95)
TQHGTTTSQGTWAAATQTAAPQYMY (SEQ ID NO: 96)

TKQKENECLGDYPEVLQEIWPVLRNVET (SEQ ID NO: 97)

TM5 mutant:
NFLGFQQPQQIMSYCYFRIT (SEQ ID NO: 98)
NFQGFLQPQQTMSYCYFRIT (SEQ ID NO: 99)
NFQGFLQPQQTMSYCYFRTT (SEQ ID NO: 100)
NFQGFQQPQQTMSYCYYRIT (SEQ ID NO: 101)
NFQGFLQPQQTMSYCYYRTT (SEQ ID NO: 102)
NFQGYLQPQQTMSYCYFRTT (SEQ ID NO: 103)
NYQGFQQPQQTMSYCYFRTT (SEQ ID NO: 104)
NYQGYQQPQQTMSYCYYRTT (SEQ ID NO: 105)

QTLFSCKNHKKAKAIK (SEQ ID NO: 106)

TM6 mutant:
LIQQTTTTFYQFWTPYNTMTFQETL (SEQ ID NO: 107)
LIQQTTTTFYQYWTPYNVMTFQETQ (SEQ ID NO: 108)
LIQQTTTTYYQFWTPYNTMTFQETQ (SEQ ID NO: 109)
QIQQTTTTFYQYWTPYNTMTFQETQ (SEQ ID NO: 110)
LTQQTTTTYYQFWTPYNTMTFQETQ (Sequence number 111)
QIQQTTTTFFQYWTPYNTMTYQETQ (SEQ ID NO: 112)
QIQQTTTTFYQYWTPYNTMTYQETQ (SEQ ID NO: 113)
QTQQTTTTYYQYWTPYNTMTYQETQ (SEQ ID NO: 114)

KLYDFFPSCDMRKDLRL (SEQ ID NO: 115)

TM7 variant:
ALSVTETVAFSHCCQNPQIYAFAG (SEQ ID NO: 116)
AQSVTETTAFSHCCQNPLIYAFAG (SEQ ID NO: 117)
ALSVTETVAFSHCCQNPQTYAYAG (SEQ ID NO: 118)
AQSVTETTAFSHCCQNPQIYAYAG (SEQ ID NO: 119)
ALSVTETTAFSHCCQNPQTYAYAG (SEQ ID NO: 120)
ALSTTETTAYSHCCQNPQIYAFAG (SEQ ID NO: 121)
ALSVTETTAYSHCCQNPQTYAYAG (SEQ ID NO: 122)
AQSTTETTAYSHCCQNPQTYAYAG (SEQ ID NO: 123)

EKFRRYLYHLYGKCLAVLCGRSVHVDFSSSESQRSRHGSVLSSNFTYHTSDGDALLLL (SEQ ID NO: 124)

上記実施例１と同様に、膜貫通ドメイン変異体の各リストの前、間および後の配列はそれぞれＮ’、中間およびＣ’細胞内または細胞外領域である。 Similar to Example 1 above, the sequences before, between, and after each list of transmembrane domain variants are the N', middle, and C' intracellular or extracellular regions, respectively.

次いで当該技術分野で知られているように、上記配列を使用して発現系、この場合は酵母における発現に適したコード配列を生成した。次いで、このコード配列を組み替えて発現させ、それぞれが各変異体リスト内の１種の膜貫通ドメイン変異体をそれぞれの細胞内および細胞外ドメインの間に含む、配列番号６８、７７、８６、８８、９７、１０６および１１５を有する複数のタンパク質を含むライブラリーを作製した。 The above sequences were then used to generate coding sequences suitable for expression in an expression system, in this case yeast, as is known in the art. This coding sequence is then recombinantly expressed to produce SEQ ID NOs: 68, 77, 86, 88, each containing one transmembrane domain variant within each variant list between the respective intracellular and extracellular domains. , 97, 106, and 115.

次いで、そのように作製したライブラリーを水性媒体中でＣＸ３ＣＲ１同族リガンド（ＣＸＣＬ１）との結合について実施例１に記載されているようにアッセイした。リガンド結合を検出し、次いで試料を配列決定した。７種の変異体を配列決定した。その結果を図６に示す。 The library so generated was then assayed for binding to CX3CR1 cognate ligand (CXCL1) in aqueous medium as described in Example 1. Ligand binding was detected and samples were then sequenced. Seven mutants were sequenced. The results are shown in FIG.

実施例３：ＣＣＲ３変異体
実施例１の方法をケモカイン受容体タイプ３イソ型３のために繰り返した。

Example 3: CCR3 Mutants The method of Example 1 was repeated for chemokine receptor type 3 isoform 3.

その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線）に整列させた以下の配列（下側の線）を得る。

All or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain were replaced with Q, T and Y (respectively) to align with the wild type (top line). Get the array (bottom line).

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、配列番号１２６の下線が引かれているアミノ酸を含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、配列番号１２６の細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、配列番号１２６または、配列番号１２５に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes a transmembrane domain comprising the underlined amino acids of SEQ ID NO: 126. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of the extracellular and intracellular loop sequences (sequences not underlined) of SEQ ID NO: 126. Additionally or alternatively, proteins comprising TM1 herein include SEQ ID NO: 126 or one, two, three of the natural V, L, I and F amino acids set forth in SEQ ID NO: 125, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more.

ＣＣＲ３の天然タンパク質配列を再度本方法に供する（Ｎ末端配列における差異に留意）。プログラム出力は天然配列を細胞外および細胞内領域に分け、膜貫通ドメインのそれぞれに対して８種の膜貫通ドメイン変異体を選択した。その結果を以下の表に示す。

MTTSLDTVETFGTTSYYDDVGLLCEKADTRALMA (配列番号127)

ＴＭ１変異体：
QFVPPQYSQTFTTGQQGNVTVTMTQIKY (配列番号128)
QFVPPQYSQTFTTGQQGNTTVTMTQIKY (配列番号129)
QFVPPQYSQTYTTGQQGNTTVTMTQIKY (配列番号130)
QFTPPQYSQTYTTGQQGNVTTTMTQIKY (配列番号131)
QFTPPQYSQTYTTGQQGNTVTTMTQIKY (配列番号132)
QFTPPQYSQTYTTGQQGNTTVTMTQIKY (配列番号133)
QFTPPQYSQTYTTGQQGNTTTTMTQIKY (配列番号134)
QYTPPQYSQTYTTGQQGNTTTTMTQTKY (配列番号135)

RRLRIMTNIY (配列番号136)

ＴＭ２変異体：
LLNQATSDQQFQVTQPFWIHY (配列番号137)
LQNQAISDQLFQTTQPFWTHY (配列番号138)
QQNLAISDQQFQTTQPFWTHY (配列番号139)
QLNQAISDQQFQTTQPYWTHY (配列番号140)
QQNLAISDQQYQVTQPYWTHY (配列番号141)
LQNQATSDQLFQTTQPYWTHY (配列番号142)
QQNQAISDQQYQVTQPYWTHY (配列番号143)
QQNQATSDQQYQTTQPYWTHY (配列番号144)

VRGHNWVFGHGMCK (配列番号145)

ＴＭ３変異体：
LQSGFYHTGQYSETFFTTQQTT (配列番号146)
QLSGFYHTGQYSETFFTTQQTT (配列番号147)
QLSGFYHTGQYSETFYTTQQTT (配列番号148)
QLSGFYHTGQYSETYFTTQQTT (配列番号149)
QLSGYYHTGQYSETFFTTQQTT (配列番号150)
QQSGFYHTGQYSETFFTTQQTT (配列番号151)
QQSGFYHTGQYSETFYTTQQTT (配列番号152)
QQSGYYHTGQYSETYYTTQQTT (配列番号153)

DRYLAIVHAVFALRART (配列番号154)

ＴＭ４変異体：
TTFGTTTSTVTWGQAVQAAQPEFIF (配列番号155)
TTFGTTTSTTTWGQAVQAAQPEFIF (配列番号156)
TTYGTTTSTTTWGQAVQAAQPEFIF (配列番号157)
TTYGTTTSTTTWGQAVQAAQPEFTF (配列番号158)
TTYGTTTSTTTWGQATQAAQPEFIF (配列番号159)
TTFGTTTSTTTWGQATQAAQPEFIY (配列番号160)
TTYGTTTSTTTWGQATQAAQPEFIY (配列番号161)
TTYGTTTSTTTWGQATQAAQPEYTY (配列番号162)

YETEELFEETLCSALYPEDTVYSWRHFHTLRM (配列番号163)

ＴＭ５変異体：
TIFCQVQPQQTMATCYTGTT (配列番号164)
TIFCQTQPQQVMATCYTGTT (配列番号165)
TIFCQTQPQQTMATCYTGIT (配列番号166)
TIFCQTQPQQTMATCYTGTI (配列番号167)
TTFCQVQPQQVMATCYTGTT (配列番号168)
TIYCQVQPQQVMATCYTGTT (配列番号169)
TIFCQTQPQQTMATCYTGTT (配列番号170)
TTYCQTQPQQTMATCYTGTT (配列番号171)

KTLLRCPSKKKYKAIR (配列番号172)

ＴＭ６変異体：
QTYTTMATYYTYWTPYNTATQQSSY (配列番号173)

QSILFGNDCERSKHLDL (配列番号174)

ＴＭ７変異体：
VMQVTEVTAYSHCCMNPVTYAFTG (配列番号175)
VMQVTEVTAYSHCCMNPTTYAYVG (配列番号176)
VMLTTEVTAYSHCCMNPTTYAFTG (配列番号177)
VMQVTETTAYSHCCMNPVTYAYTG (配列番号178)
TMQVTETIAYSHCCMNPTTYAFTG (配列番号179)
TMQVTETTAYSHCCMNPTTYAFVG (配列番号180)
VMQTTETIAYSHCCMNPTTYAYTG (配列番号181)
TMQTTETTAYSHCCMNPTTYAYTG (配列番号182)

ERFRKYLRHFFHRHLLMHLGRYIPFLPSEKLERTSSVSPSTAEPELSIVF (配列番号183)
The native protein sequence of CCR3 is again subjected to the method (note the differences in the N-terminal sequence). The program output divided the native sequence into extracellular and intracellular regions and selected eight transmembrane domain variants for each transmembrane domain. The results are shown in the table below.

MTTSLDTVETFGTTSYYDDVGLLCEKADTRALMA (SEQ ID NO: 127)

TM1 mutant:
QFVPPQYSQTFTTGQQGNVTVTMTQIKY (SEQ ID NO: 128)
QFVPPQYSQTFTTGQQGNTTVTMTQIKY (SEQ ID NO: 129)
QFVPPQYSQTYTTGQQGNTTVTMTQIKY (SEQ ID NO: 130)
QFTPPQYSQTYTTGQQGNVTTTMTQIKY (SEQ ID NO: 131)
QFTPPQYSQTYTTGQQGNTVTTMTQIKY (SEQ ID NO: 132)
QFTPPQYSQTYTTGQQGNTTVTMTQIKY (SEQ ID NO: 133)
QFTPPQYSQTYTTGQQGNTTTTMTQIKY (SEQ ID NO: 134)
QYTPPQYSQTYTTGQQGNTTTTMTQTKY (SEQ ID NO: 135)

RRLRIMTNIY (SEQ ID NO: 136)

TM2 variant:
LLNQATSDQQFQVTQPFWIHY (Sequence number 137)
LQNQAISDQLFQTTQPFWTHY (SEQ ID NO: 138)
QQNLAISDQQFQTTQPFWTHY (SEQ ID NO: 139)
QLNQAISDQQFQTTQPYWTHY (SEQ ID NO: 140)
QQNLAISDQQYQVTQPYWTHY (SEQ ID NO: 141)
LQNQATSDQLFQTTQPYWTHY (SEQ ID NO: 142)
QQNQAISDQQYQVTQPYWTHY (Sequence number 143)
QQNQATSDQQYQTTQPYWTHY (SEQ ID NO: 144)

VRGHNWVFGHGMCK (SEQ ID NO: 145)

TM3 mutant:
LQSGFYHTGQYSETFFTTQQTT (SEQ ID NO: 146)
QLSGFYHTGQYSETFFTTQQTT (SEQ ID NO: 147)
QLSGFYHTGQYSETFYTTQQTT (SEQ ID NO: 148)
QLSGFYHTGQYSETYFTTQQTT (SEQ ID NO: 149)
QLSGYYHTGQYSETFFTTQQTT (SEQ ID NO: 150)
QQSGFYHTGQYSETFFTTQQTT (SEQ ID NO: 151)
QQSGFYHTGQYSETFYTTQQTT (SEQ ID NO: 152)
QQSGYYHTGQYSETYYTTQQTT (SEQ ID NO: 153)

DRYLAIVHAVFALRART (SEQ ID NO: 154)

TM4 mutant:
TTFGTTTSTVTWGQAVQAAQPEFIF (SEQ ID NO: 155)
TTFGTTTSTTTWGQAVQAAQPEFIF (SEQ ID NO: 156)
TTYGTTTSTTTWGQAVQAAQPEFIF (SEQ ID NO: 157)
TTYGTTTSTTTWGQAVQAAQPEFTF (SEQ ID NO: 158)
TTYGTTTSTTTWGQATQAAQPEFIF (SEQ ID NO: 159)
TTFGTTTSTTTWGQATQAAQPEFIY (SEQ ID NO: 160)
TTYGTTTSTTTWGQATQAAQPEFIY (SEQ ID NO: 161)
TTYGTTTSTTTWGQATQAAQPEYTY (SEQ ID NO: 162)

YETEELFEETLCSALYPEDTVYSWRHFHTLRM (SEQ ID NO: 163)

TM5 mutant:
TIFCQVQPQQTMATCYTGTT (SEQ ID NO: 164)
TIFCQTQPQQVMATCYTGTT (SEQ ID NO: 165)
TIFCQTQPQQTMATCYTGIT (SEQ ID NO: 166)
TIFCQTQPQQTMATCYTGTI (SEQ ID NO: 167)
TTFCQVQPQQVMATCYTGTT (SEQ ID NO: 168)
TIYCQVQPQQVMATCYTGTT (SEQ ID NO: 169)
TIFCQTQPQQTMATCYTGTT (SEQ ID NO: 170)
TTYCQTQPQQTMATCYTGTT (SEQ ID NO: 171)

KTLLRCPSKKKYKAIR (SEQ ID NO: 172)

TM6 mutant:
QTYTTMATYYTYWTPYNTATQQSSY (SEQ ID NO: 173)

QSILFGNDCERSKHLDL (SEQ ID NO: 174)

TM7 variant:
VMQVTEVTAYSHCCMNPTYAFTG (Sequence number 175)
VMQVTEVTAYSHCCMNPTTYAYVG (array number 176)
VMLTTEVTAYSHCCMNPTTYAFTG (SEQ ID NO: 177)
VMQVTETTAYSHCCMNPTYAYTG (SEQ ID NO: 178)
TMQVTETIAYSHCCMNPTYAFTG (SEQ ID NO: 179)
TMQVTETTAYSHCCMNPTTYAFVG (Sequence number 180)
VMQTTETIAYSHCCMNPTTYAYTG (SEQ ID NO: 181)
TMQTTETTAYSHCCMNPTTYAYTG (SEQ ID NO: 182)

ERFRKYLRHFFHRHLLMHLGRYIPFLPSEKLERTSSVSPSTAEPELSIVF (SEQ ID NO: 183)

次いで当該技術分野で知られているように、上記配列を使用して発現系、この場合は酵母における発現に適したコード配列を生成した。次いで、このコード配列を組み替えて発現させ、それぞれが各変異体リスト内の１種の膜貫通ドメイン変異体をそれぞれの細胞内および細胞外ドメインの間に含む、配列番号１２７、１３６、１４５、１５４、１６３、１７２、１７４および１８３を有する複数のタンパク質を含むライブラリーを作製した。 The above sequences were then used to generate coding sequences suitable for expression in an expression system, in this case yeast, as is known in the art. This coding sequence is then recombined and expressed to produce SEQ ID NOs: 127, 136, 145, 154, each containing one transmembrane domain variant within each variant list between the respective intracellular and extracellular domains. , 163, 172, 174, and 183.

次いで、そのように作製したライブラリーを実施例１に記載されているように水性媒体中でＣＣＲ３同族リガンドすなわちＣＣＬ３との結合についてアッセイした。リガンド結合を検出し、次いで試料を配列決定した。１１種の変異体を配列決定した。その結果を図７に示す。 The library so generated was then assayed for binding to the CCR3 cognate ligand, CCL3, in aqueous medium as described in Example 1. Ligand binding was detected and samples were then sequenced. Eleven mutants were sequenced. The results are shown in FIG.

実施例４：ＣＣＲ５変異体
実施例１の方法をケモカイン受容体タイプ５イソ型３のために繰り返した。

Example 4: CCR5 Variants The method of Example 1 was repeated for chemokine receptor type 5 isoform 3.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、配列番号１８５の下線が引かれているアミノ酸を含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、配列番号１８５の細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、配列番号１８５または、配列番号１８４に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes a transmembrane domain comprising the underlined amino acids of SEQ ID NO: 185. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of the extracellular and intracellular loop sequences (sequences not underlined) of SEQ ID NO: 185. Additionally or alternatively, proteins comprising TM1 herein include SEQ ID NO: 185 or one, two, three of the naturally occurring V, L, I and F amino acids set forth in SEQ ID NO: 184, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more.

ＣＣＲ５の天然タンパク質配列を再度本方法に供する（Ｎ末端配列における差異に留意）。プログラム出力は天然配列を細胞外および細胞内領域に分け、膜貫通ドメインのそれぞれに対して８種の膜貫通ドメイン変異体を選択した。その結果を以下の表に示す。

MDYQVSSPIYDINYYTSEPCQKINVKQIAA (配列番号186)

ＴＭ１変異体：
RLQPPQYSQTFTFGFTGNMQVTQTQINC (配列番号187)
RLQPPQYSQTFTFGYTGNMQVTQTQINC (配列番号188)
RQQPPQYSQTFTFGFTGNMQTTQTQINC (配列番号189)
RQQPPQYSQTFTYGFTGNMQTTQTQINC (配列番号190)
RQQPPQYSQTYTFGFTGNMQTTQTQINC (配列番号191)
RQQPPQYSQTFTFGYTGNMQTTQTQINC (配列番号192)
RQQPPQYSQTYTFGYTGNMQTTQTQINC (配列番号193)
RQQPPQYSQTYTYGYTGNMQTTQTQTNC (配列番号194)

KRLKSMTDIY (配列番号195)

ＴＭ２変異体：
LQNQAISDQFFQQTVPFWAHY (配列番号196)
LQNQAISDQFFQQTTPFWAHY (配列番号197)
LQNQAISDQFFQQTTPYWAHY (配列番号198)
LQNQAISDQFYQQTTPYWAHY (配列番号199)
LQNQAISDQYFQQTTPYWAHY (配列番号200)
LQNQATSDQFFQQTTPYWAHY (配列番号201)
LQNQAISDQYYQQTTPYWAHY (配列番号202)
QQNQATSDQYYQQTTPYWAHY (配列番号203)

AAAQWDFGNTMCQ (配列番号204)

ＴＭ３変異体：
QQTGQYFTGYYSGTYYTTQQTT (配列番号205)
QQTGQYYTGYYSGTYYTTQQTT (配列番号206)

DRYLAVVHAVFALKART (配列番号207)

ＴＭ４変異体：
TTYGTTTSTTTWTTATYASQPGTTY (配列番号208)

TRSQKEGLHYTCSSHFPYSQYQFWKNFQTLKI (配列番号209)

ＴＭ５変異体：
VIQGQVQPQQVMVTCYSGIQ (配列番号210)
VIQGQVQPQQVMTTCYSGIQ (配列番号211)
VIQGQVQPQQTMTTCYSGIQ (配列番号212)
VTQGQVQPQQTMVTCYSGTQ (配列番号213)
TIQGQVQPQQVMTTCYSGTQ (配列番号214)
TIQGQVQPQQTMVTCYSGTQ (配列番号215)
TTQGQVQPQQVMTTCYSGTQ (配列番号216)
TTQGQTQPQQTMTTCYSGTQ (配列番号217)

KTLLRCRNEKKRHRAVR (配列番号218)

ＴＭ６変異体：
QTFTTMTTYYQFWAPYNIVQQLNTF (配列番号219)
QTFTTMTTYYQFWAPYNTVQQLNTF (配列番号220)
QTFTTMTTYYQYWAPYNTVQQLNTF (配列番号221)
QTFTTMTTYYQYWAPYNTVQQQNTF (配列番号222)
QTYTTMTTYYQYWAPYNTVQQLNTF (配列番号223)
QTFTTMTTYYQYWAPYNTTQQLNTF (配列番号224)
QTYTTMTTYYQYWAPYNTVQQQNTF (配列番号225)
QTYTTMTTYYQYWAPYNTTQQQNTY (配列番号225)

QEFFGLNNCSSSNRLDQ (配列番号226)

ＴＭ７変異体：
AMQVTETQGMTHCCINPIIYAFVG (配列番号227)
AMQVTETLGMTHCCTNPIIYAFTG (配列番号228)
AMQVTETQGMTHCCINPTIYAYVG (配列番号229)
AMQTTETQGMTHCCINPITYAFTG (配列番号230)
AMQTTETQGMTHCCINPTIYAFTG (配列番号231)
AMQVTETQGMTHCCTNPTIYAYVG (配列番号232)
AMQTTETQGMTHCCINPTTYAYVG (配列番号233)
AMQTTETQGMTHCCTNPTTYAYTG (配列番号234)

EKFRNYLLVFFQKHIAKRFCKCCSIFQQEAPERASSVYTRSTGEQEISVGL (配列番号235)
The native protein sequence of CCR5 is again subjected to this method (note the differences in the N-terminal sequence). The program output divided the native sequence into extracellular and intracellular regions and selected eight transmembrane domain variants for each transmembrane domain. The results are shown in the table below.

MDYQVSSPIYDINYYTSEPCQKINVKQIAA (SEQ ID NO: 186)

TM1 mutant:
RLQPPQYSQTFTFGFTGNMQVTQTQINC (SEQ ID NO: 187)
RLQPPQYSQTFTFGYTGNMQVTQTQINC (Sequence number 188)
RQQPPQYSQTFTFGFTGNMQTTQTQINC (SEQ ID NO: 189)
RQQPPQYSQTFTYGFTGNMQTTQTQINC (SEQ ID NO: 190)
RQQPPQYSQTYTFGFTGNMQTTQTQINC (SEQ ID NO: 191)
RQQPPQYSQTFTFGYTGNMQTTQTQINC (SEQ ID NO: 192)
RQQPPQYSQTYTFGYTGNMQTTQTQINC (SEQ ID NO: 193)
RQQPPQYSQTYTYGYTGNMQTTQTQTNC (SEQ ID NO: 194)

KRLKSMTDIY (SEQ ID NO: 195)

TM2 variant:
LQNQAISDQFFQQTVPFWAHY (SEQ ID NO: 196)
LQNQAISDQFFQQTTPFWAHY (SEQ ID NO: 197)
LQNQAISDQFFQQTTPYWAHY (SEQ ID NO: 198)
LQNQAISDQFYQQTTPYWAHY (SEQ ID NO: 199)
LQNQAISDQYFQQTTPYWAHY (Sequence number 200)
LQNQATSDQFFQQTTPYWAHY (SEQ ID NO: 201)
LQNQAISDQYYQQTTPYWAHY (SEQ ID NO: 202)
QQNQATSDQYYQQTTPYWAHY (SEQ ID NO: 203)

AAAQWDFGNTMCQ (SEQ ID NO: 204)

TM3 mutant:
QQTGQYFTGYYSGTYYTTQQTT (SEQ ID NO: 205)
QQTGQYYTGYYSGTYYTTQQTT (SEQ ID NO: 206)

DRYLAVVHAVFALKART (SEQ ID NO: 207)

TM4 mutant:
TTYGTTTSTTTWTTATYASQPGTTY (SEQ ID NO: 208)

TRSQKEGLHYTCSSHFPYSQYQFWKNFQTLKI (SEQ ID NO: 209)

TM5 mutant:
VIQGQVQPQQVMVTCYSGIQ (Sequence number 210)
VIQGQVQPQQVMTTCYSGIQ (SEQ ID NO: 211)
VIQGQVQPQQTMTTCYSGIQ (SEQ ID NO: 212)
VTQGQVQPQQTMVTCYSGTQ (array number 213)
TIQGQVQPQQVMTTCYSGTQ (Sequence number 214)
TIQGQVQPQQTMVTCYSGTQ (Sequence number 215)
TTQGQVQPQQVMTTCYSGTQ (Sequence number 216)
TTQGQTQPQQTMTTCYSGTQ (SEQ ID NO: 217)

KTLLRCRNEKKRHRAVR (SEQ ID NO: 218)

TM6 mutant:
QTFTTMTTYYQFWAPYNIVQQLNTF (SEQ ID NO: 219)
QTFTTMTTYYQFWAPYNTVQQLNTF (SEQ ID NO: 220)
QTFTTMTTYYQYWAPYNTVQQLNTF (SEQ ID NO: 221)
QTFTTMTTYYQYWAPYNTVQQQNTF (SEQ ID NO: 222)
QTYTTMTTYYQYWAPYNTVQQLNTF (SEQ ID NO: 223)
QTFTTMTTYYQYWAPYNTTQQLNTF (SEQ ID NO: 224)
QTYTTMTTYYQYWAPYNTVQQQNTF (SEQ ID NO: 225)
QTYTTMTTYYQYWAPYNTTQQQNTY (SEQ ID NO: 225)

QEFFGLNNCSSSNRLDQ (SEQ ID NO: 226)

TM7 variant:
AMQVTETQGMTHCCINPIIYAFVG (SEQ ID NO: 227)
AMQVTETLGMTHCCTNPIIYAFTG (SEQ ID NO: 228)
AMQVTETQGMTHCCINPTIYAYVG (SEQ ID NO: 229)
AMQTTETQGMTHCCINPITYAFTG (SEQ ID NO: 230)
AMQTTETQGMTHCCINPTIYAFTG (SEQ ID NO: 231)
AMQVTETQGMTHCCTNPTIYAYVG (SEQ ID NO: 232)
AMQTTETQGMTHCCINPTTYAYVG (SEQ ID NO: 233)
AMQTTETQGMTHCCTNPTTYAYTG (SEQ ID NO: 234)

EKFRNYLLVFFQKHIAKRFCKCCSIFQQEAPERASSVYTRSTGEQEISVGL (SEQ ID NO: 235)

上記実施例１と同様に、膜貫通ドメイン変異体の各リストの前、間および後の配列は、はそれぞれＮ’、中間およびＣ’細胞内または細胞外領域である。 Similar to Example 1 above, the sequences before, between, and after each list of transmembrane domain variants are the N', middle, and C' intracellular or extracellular regions, respectively.

次いで当該技術分野で知られているように、上記配列を使用して発現系、この場合は酵母における発現に適したコード配列を生成した。次いで、このコード配列を組み替えて発現させ、それぞれが各変異体リスト内の１種の膜貫通ドメイン変異体をそれぞれの細胞内および細胞外ドメインの間に含む、配列番号１８６、１９５、２０４、２０７、２０９、２１８、２２６および２３５を有する複数のタンパク質を含むライブラリーを作製した。 The above sequences were then used to generate coding sequences suitable for expression in an expression system, in this case yeast, as is known in the art. This coding sequence is then recombinantly expressed to produce SEQ ID NOs: 186, 195, 204, 207, each containing one transmembrane domain variant within each variant list between the respective intracellular and extracellular domains. , 209, 218, 226, and 235.

次いで、そのように作製したライブラリーを水性媒体中でＣＣＲ５同族リガンドであるＣＣＬ５との結合について実施例１に記載されているようにアッセイした。リガンド結合を検出し、次いで試料を配列決定した。１種の変異体を配列決定した。その結果を図８に示す。 The library so generated was then assayed in aqueous medium for binding to the CCR5 cognate ligand, CCL5, as described in Example 1. Ligand binding was detected and samples were then sequenced. One mutant was sequenced. The results are shown in FIG.

実施例５：ＣＸＣＲ３変異体
実施例１の方法をＣＸＣケモカイン受容体タイプ３イソ型２のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（配列番号３２４、上側の線）に整列させた以下の配列（配列番号３２５、下側の線）を得る。

Example 5: CXCR3 Mutants The method of Example 1 was repeated for CXC chemokine receptor type 3 isoform 2. All or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain are replaced with Q, T and Y (respectively), aligned to the wild type (SEQ ID NO: 324, upper line). The following sequence (SEQ ID NO: 325, lower line) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、配列番号３２５または、配列番号３２４に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include SEQ ID NO: 325 or one, two, three of the natural V, L, I and F amino acids set forth in SEQ ID NO: 324, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more.

上に記載したように、ＣＸＣＲ３の天然タンパク質配列を本方法に供した。プログラム出力は天然配列を細胞外および細胞内領域に分け、膜貫通ドメインのそれぞれに対して８種の膜貫通ドメイン変異体を選択した。その結果を以下の表に示す。

MVLEVSDHQVLNDAEVAALLENFSSSYDYGENESDSCCTSPPCPQDFSLNFDR (配列番号235)

ＴＭ１変異体：
AFLPALYSQQFQQGQQGNGAVAATQLS (配列番号236)
AFQPALYSQQFQQGQQGNGAVAAVQQS (配列番号237)
AFQPAQYSQQFLQGQQGNGAVAATQQS (配列番号238)
AYQPALYSLQYQQGQQGNGATAAVQQS (配列番号239)
AYQPALYSQLFQQGQQGNGATAATQQS (配列番号240)
AFQPALYSLQYQQGQQGNGATAATQQS (配列番号241)
AYQPAQYSLQYQQGQQGNGATAAVQQS (配列番号242)
AYQPAQYSQQYQQGQQGNGATAATQQS (配列番号243)

RRTALSSTD (配列番号244)

ＴＭ２変異体：
TFLQHLAVADTQQVQTLPQWA (配列番号245)
TFLQHQAVADTQLVQTQPQWA (配列番号246)
TFQQHLAVADTQQVQTQPQWA (配列番号247)
TYLQHQAVADTQQVQTQPQWA (配列番号248)
TYQLHQAVADTQQVQTQPQWA (配列番号249)
TYQQHLAVADTQQVQTQPQWA (配列番号250)
TYQQHQAVADTQQVQTQPQWA (配列番号251)
TYQQHQATADTQQTQTQPQWA (配列番号252)

VDAAVQWVFGSGLCK (配列番号253)

ＴＭ３変異体：
TAGAQYNTNFYAGAQQQACISF (配列番号254)
TAGAQYNTNFYAGAQLQACTSF (配列番号255)
TAGAQYNTNFYAGAQQLACTSF (配列番号256)
TAGAQFNTNYYAGAQQQACISF (配列番号257)
TAGAQYNTNYYAGAQQQACISF (配列番号258)
TAGAQYNTNYYAGAQLQACTSF (配列番号259)
TAGAQYNTNYYAGAQQLACTSF (配列番号260)
TAGAQYNTNYYAGAQQQACTSY (配列番号261)

DRYLNIVHATQLYRRGPPARVT (配列番号262)

ＴＭ４変異体：
LTCQAVWGQCQQFAQPDFIF (配列番号263)
QTCQAVWGQCQQFAQPDFIF (配列番号264)
QTCQATWGQCQQFAQPDFIF (配列番号265)
QTCQATWGQCQQYAQPDFIF (配列番号266)
QTCQATWGQCQQFAQPDFTF (配列番号267)
QTCQATWGQCQQFAQPDYIF (配列番号268)
QTCQATWGQCQQYAQPDYIF (配列番号269)
QTCQATWGQCQQYAQPDYTY (配列番号270)

LSAHHDERLNATHCQYNFPQVGR (配列番号271)

ＴＭ５変異体：
TAQRTQQQTAGYQQPQQTMAY (配列番号272)

CYAHILAVLLVSRGQRRLRAMR (配列番号273)

ＴＭ６変異体：
QVTTTTVAFAQCWTPYHQVVQV (配列番号274)
QVTTTTVAFAQCWTPYHQTVQV (配列番号275)
QVTTTTTAFAQCWTPYHQTVQV (配列番号276)
QVTTTTTAYAQCWTPYHQTVQV (配列番号277)
QVTTTTTAFAQCWTPYHQTTQV (配列番号278)
QTTTTTVAFAQCWTPYHQTTQV (配列番号279)
QVTTTTTAYAQCWTPYHQTTQV (配列番号280)
QTTTTTTAYAQCWTPYHQTTQT (配列番号281)

DILMDLGALARNCGRESRVDV (配列番号282)

ＴＭ７変異体：
AKSVTSGQGYMHCCLNPLQYAFV (配列番号283)
AKSVTSGQGYMHCCLNPQLYAFT (配列番号284)
AKSVTSGQGYMHCCLNPLQYAFT (配列番号285)
AKSTTSGQGYMHCCLNPQQYAFV (配列番号286)
AKSTTSGQGYMHCCQNPLQYAFV (配列番号287)
AKSTTSGQGYMHCCQNPQLYAFV (配列番号288)
AKSTTSGQGYMHCCQNPLQYAFT (配列番号289)
AKSTTSGQGYMHCCQNPQQYAYT (配列番号290)

GVKFRERMWMLLLRLGCPNQRGLQRQPSSSRRDSSWSETSEASYSGL (配列番号291)
The native protein sequence of CXCR3 was subjected to the method as described above. The program output divided the native sequence into extracellular and intracellular regions and selected eight transmembrane domain variants for each transmembrane domain. The results are shown in the table below.

MVLEVSDHQVLNDAEVAALLENFSSSYDYGENESDSCCTSPPCPQDFSLNFDR (SEQ ID NO: 235)

TM1 mutant:
AFLPALYSQQFQQGQQGNGAVAATQLS (SEQ ID NO: 236)
AFQPALYSQQFQQGQQGNGAVAAVQQS (SEQ ID NO: 237)
AFQPAQYSQQFLQGQQGNGAVAATQQS (SEQ ID NO: 238)
AYQPALYSLQYQQGQQGNGATAAVQQS (SEQ ID NO: 239)
AYQPALYSQLFQQGQQGNGATAATQQS (array number 240)
AFQPALYSLQYQQGQQGNGATAATQQS (SEQ ID NO: 241)
AYQPAQYSLQYQQGQQGNGATAAVQQS (Sequence number 242)
AYQPAQYSQQYQQGQQGNGATAATQQS (Sequence number 243)

RRTALSSTD (SEQ ID NO: 244)

TM2 variant:
TFLQHLAVADTQQVQTLPQWA (SEQ ID NO: 245)
TFLQHQAVADTQLVQTQPQWA (SEQ ID NO: 246)
TFQQHLAVADTQQVQTQPQWA (SEQ ID NO: 247)
TYLQHQAVADTQQVQTQPQWA (SEQ ID NO: 248)
TYQLHQAVADTQQVQTQPQWA (Sequence number 249)
TYQQHLAVADTQQVQTQPQWA (Sequence number 250)
TYQQHQAVADTQQVQTQPQWA (Sequence number 251)
TYQQHQATADTQQTQTQPQWA (Sequence number 252)

VDAAVQWVFGSGLCK (SEQ ID NO: 253)

TM3 mutant:
TAGAQYNTNFYAGAQQQACISF (SEQ ID NO: 254)
TAGAQYNTNFYAGAQLQACTSF (SEQ ID NO: 255)
TAGAQYNTNFYAGAQQLACTSF (SEQ ID NO: 256)
TAGAQFNTNYYAGAQQQACISF (SEQ ID NO: 257)
TAGAQYNTNYYAGAQQQACISF (SEQ ID NO: 258)
TAGAQYNTNYYAGAQLQACTSF (SEQ ID NO: 259)
TAGAQYNTNYYAGAQQLACTSF (SEQ ID NO: 260)
TAGAQYNTNYYAGAQQQACTSY (SEQ ID NO: 261)

DRYLNIVHATQLYRRGPPARVT (SEQ ID NO: 262)

TM4 mutant:
LTCQAVWGQCQQFAQPDFIF (SEQ ID NO: 263)
QTCQAVWGQCQQFAQPDFIF (SEQ ID NO: 264)
QTCQATWGQCQQFAQPDFIF (SEQ ID NO: 265)
QTCQATWGQCQQYAQPDFIF (SEQ ID NO: 266)
QTCQATWGQCQQFAQPDFTF (SEQ ID NO: 267)
QTCQATWGQCQQFAQPDYIF (SEQ ID NO: 268)
QTCQATWGQCQQYAQPDYIF (SEQ ID NO: 269)
QTCQATWGQCQQYAQPDYTY (SEQ ID NO: 270)

LSAHHDERLNATHCQYNFPQVGR (SEQ ID NO: 271)

TM5 mutant:
TAQRTQQQTAGYQQPQQTMAY (Sequence number 272)

CYAHILAVLLVSRGQRRLRAMR (SEQ ID NO: 273)

TM6 mutant:
QVTTTVAFAQCWTPYHQVVQV (SEQ ID NO: 274)
QVTTTVAFAQCWTPYHQTVQV (SEQ ID NO: 275)
QVTTTTTAFAQCWTPYHQTVQV (SEQ ID NO: 276)
QVTTTTTAYAQCWTPYHQTVQV (SEQ ID NO: 277)
QVTTTTTAFAQCWTPYHQTTQV (SEQ ID NO: 278)
QTTTTVAFAQCWTPYHQTTQV (SEQ ID NO: 279)
QVTTTTTAYAQCWTPYHQTTQV (SEQ ID NO: 280)
QTTTTTTAYAQCWTPYHQTTQT (SEQ ID NO: 281)

DILMDLGALARNCGRESRVDV (SEQ ID NO: 282)

TM7 variant:
AKSVTSGQGYMHCCLNPLQYAFV (SEQ ID NO: 283)
AKSVTSGQGYMHCCLNPQLYAFT (SEQ ID NO: 284)
AKSVTSGQGYMHCCLNPLQYAFT (SEQ ID NO: 285)
AKSTTSGQGYMHCCLNPQQYAFV (SEQ ID NO: 286)
AKSTTSGQGYMHCCQNPLQYAFV (SEQ ID NO: 287)
AKSTTSGQGYMHCCQNPQLYAFV (SEQ ID NO: 288)
AKSTTSGQGYMHCCQNPLQYAFT (SEQ ID NO: 289)
AKSTTSGQGYMHCCQNPQQYAYT (SEQ ID NO: 290)

GVKFRERMWMLLLRLGCPNQRGLQRQPSSSRRDSSWSETSEASYSGL (SEQ ID NO: 291)

上記配列を使用して当該技術分野で知られているように発現系、この場合は酵母における発現に適したコード配列を生成することができる。次いで、このコード配列を組み替えて発現させ、それぞれが各変異体リスト内の１種の膜貫通ドメイン変異体をそれぞれの細胞内および細胞外ドメインの間に含む、細胞内および細胞外ループを有する複数のタンパク質を含むライブラリーを作製した。 The above sequences can be used to generate coding sequences suitable for expression in expression systems, in this case yeast, as known in the art. This coding sequence is then recombined and expressed to produce multiple intracellular and extracellular loops, each containing one transmembrane domain variant in each variant list between its respective intracellular and extracellular domains. A library containing the following proteins was created.

次いで、そのように作製したライブラリーを水性媒体中で同族リガンドとの結合について実施例１に記載されているようにアッセイすることができる。 The library so produced can then be assayed for binding to the cognate ligand in aqueous medium as described in Example 1.

実施例６：（ＣＣＲ－１）ＣＣケモカイン受容体タイプ１
実施例１を表題のタンパク質のために繰り返した。

Example 6: (CCR-1) CC chemokine receptor type 1
Example 1 was repeated for the title protein.

その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号２９２）に整列させた以下の配列（下側の線、配列番号２９３）を得る。

Aligned to wild type (upper line, SEQ ID NO: 292) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 293) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、下線が引かれたドメインをそれぞれ含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes transmembrane domains that each include the underlined domains. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three of the naturally occurring V, L, I and F amino acids as described in the depicted protein or in the wild type sequence. one or more additional transmembrane regions (underlined sequences) in homologous sequences retaining one or optionally four or more.

当該野生型配列を上に記載した方法に供して実施例１に記載されているようにさらなる膜貫通ドメイン変異体を選択することができる。コード配列を設計し、組み替えてタンパク質を発現させることができる。発現させたタンパク質を本明細書に記載されているようにリガンド結合についてアッセイすることができる。 The wild-type sequence can be subjected to the methods described above to select additional transmembrane domain variants as described in Example 1. Coding sequences can be designed and recombined to express proteins. Expressed proteins can be assayed for ligand binding as described herein.

実施例７：（ＣＣＲ－２）ＣＣケモカイン受容体タイプ２イソ型Ａ
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号２９４）に整列させた以下の配列（下側の線、配列番号２９５）を得る。

Example 7: (CCR-2) CC Chemokine Receptor Type 2 Isoform A
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 294) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 295) is obtained.

実施例８：（ＣＣＲ－４）ＣＣケモカイン受容体タイプ４
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号２９６）に整列させた以下の配列（下側の線、配列番号２９７）を得る。

Example 8: (CCR-4) CC chemokine receptor type 4
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 296) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 297) is obtained.

実施例９：（ＣＣＲ－６）ＣＣケモカイン受容体タイプ６
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号２９８）に整列させた以下の配列（下側の線、配列番号２９９）を得る。

Example 9: (CCR-6) CC chemokine receptor type 6
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 298) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 299) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、下線が引かれたドメインをそれぞれ含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｌ、Ｉ、ＶおよびＦアミノ酸の１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes transmembrane domains that each include the underlined domains. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more.

実施例１０：（ＣＣＲ－７）ＣＣケモカイン受容体タイプ７前駆体
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３００）に整列させた以下の配列（下側の線、配列番号３０１）を得る。

Example 10: (CCR-7) CC Chemokine Receptor Type 7 Precursor Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 300) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 301) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、下線が引かれたドメインをそれぞれ含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｌ、Ｉ、ＶおよびＦアミノ酸の１つ、２つ、３つまたは場合により４つまたはそれ以上を保持する相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes transmembrane domains that each include the underlined domains. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three, or One or more additional transmembrane regions (sequences underlined) in homologous sequences, optionally retaining four or more.

実施例１１：（ＣＣＲ－８）ＣＣケモカイン受容体タイプ８
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３０２）に整列させた以下の配列（下側の線、配列番号３０３）を得る。

Example 11: (CCR-8) CC chemokine receptor type 8
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 302) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 303) is obtained.

実施例１２：（ＣＣＲ－９）ＣＣケモカイン受容体タイプ９イソ型Ｂ
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３０４）に整列させた以下の配列（下側の線、配列番号３０５）を得る。

Example 12: (CCR-9) CC Chemokine Receptor Type 9 Isoform B
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 304) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 305) is obtained.

実施例１３：（ＣＣＲ－１０）ＣＣケモカイン受容体タイプ１０
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３０６）に整列させた以下の配列（下側の線、配列番号３０７）を得る。

Example 13: (CCR-10) CC chemokine receptor type 10
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 306) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 307) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、下線が引かれたドメインをそれぞれ含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｌ、Ｉ、ＶおよびＦアミノ酸の１つ、２つ、３つまたは場合により４つまたはそれ以上を保持する相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。当該野生型配列を上に記載した方法に供して実施例１に記載されているようにさらなる膜貫通ドメイン変異体を選択することができる。コード配列を設計し、組み替えてタンパク質を発現させることができる。発現させたタンパク質を本明細書に記載されているようにリガンド結合についてアッセイすることができる。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes transmembrane domains that each include the underlined domains. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three, or One or more additional transmembrane regions (sequences underlined) in homologous sequences, optionally retaining four or more. The wild-type sequence can be subjected to the methods described above to select additional transmembrane domain variants as described in Example 1. Coding sequences can be designed and recombined to express proteins. Expressed proteins can be assayed for ligand binding as described herein.

実施例１４：（ＣＸＣＲ１）ケモカイン受容体タイプ１
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３０８）に整列させた以下の配列（下側の線、配列番号３０９）を得る。

Example 14: (CXCR1) Chemokine receptor type 1
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 308) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 309) is obtained.

実施例１５：（ＣＸＲ１）ＣＸＲケモカイン受容体タイプ１
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３１０）に整列させた以下の配列（下側の線、配列番号３１１）を得る。

Example 15: (CXR1) CXR chemokine receptor type 1
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 310) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 311) is obtained.

実施例１６：（ＣＸＣＲ２）ケモカイン受容体タイプ２
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３１２）に整列させた以下の配列（下側の線、配列番号３１３）を得る。

Example 16: (CXCR2) Chemokine receptor type 2
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 312) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 313) is obtained.

実施例１７：（ＣＣＲ－１０）ＣＣケモカイン受容体タイプ１０
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３１４）に整列させた以下の配列（下側の線、配列番号３１５）を得る。

Example 17: (CCR-10) CC chemokine receptor type 10
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 314) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 315) is obtained.

実施例１８：（ＣＸＣＲ６）ケモカイン受容体タイプ６
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３１６）に整列させた以下の配列（下側の線、配列番号３１７）を得る。

Example 18: (CXCR6) Chemokine receptor type 6
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 316) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 317) is obtained.

実施例１９：（ＣＸＣＲ７）ケモカイン受容体タイプ７
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３１８）に整列させた以下の配列（下側の線、配列番号３１９）を得る。

Example 19: (CXCR7) Chemokine receptor type 7
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 318) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 319) is obtained.

実施例２０：（ＣＬＲ－１ａ）ケモカイン様受容体１イソ型ａ
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦの全てまたは実質的に全てをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３２０）に整列させた以下の配列（下側の線、配列番号３２１）を得る。

Example 20: (CLR-1a) Chemokine-like receptor 1 isoform a
Example 1 was repeated for the title protein. Aligned to wild type (upper line, SEQ ID NO: 320) with all or substantially all of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively) The following sequence (lower line, SEQ ID NO: 321) is obtained.

実施例２１：ＤＡＲＩＡダッフィ抗原／ケモカイン受容体イソ型ａ
実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３２２）に整列させた以下の配列（下側の線、配列番号３２３）を得る。

Example 21: DARIA Duffy antigen/chemokine receptor isotype a
Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 322) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 323) is obtained.

予測される膜貫通領域のそれぞれに下線が引かれており、本発明の完全に修飾されたドメインが例示されている。従って、例えば本発明は、下線が引かれたドメインをそれぞれ含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｌ、Ｉ、ＶおよびＦアミノ酸の１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。当該野生型配列を上に記載した方法に供して実施例１に記載されているようにさらなる膜貫通ドメイン変異体を選択することができる。コード配列を設計し、組み替えてタンパク質を発現させることができる。発現させたタンパク質を本明細書に記載されているようにリガンド結合についてアッセイすることができる。 Each predicted transmembrane region is underlined and exemplifies a fully modified domain of the invention. Thus, for example, the invention includes transmembrane domains that each include the underlined domains. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three, or One or more additional transmembrane regions (sequences underlined) in homologous sequences optionally retaining four or more. The wild-type sequence can be subjected to the methods described above to select additional transmembrane domain variants as described in Example 1. Coding sequences can be designed and recombined to express proteins. Expressed proteins can be assayed for ligand binding as described herein.

実施例２２：ＣＤ８１抗原
ＣＤ８１はリンパ腫細胞増殖の制御において重要な役割を担う場合があり、１６ｋＤａのＬｅｕ－１３タンパク質と相互作用して場合によりシグナル伝達に関与する複合体を形成する。ＣＤ８１はＨＣＶのウイルス受容体として機能する場合がある。 Example 22: CD81 Antigen CD81 may play an important role in the control of lymphoma cell proliferation, interacting with the 16 kDa Leu-13 protein to form a complex possibly involved in signal transduction. CD81 may function as a viral receptor for HCV.

実施例１を表題のタンパク質のために繰り返した。その膜貫通ドメイン内の疎水性アミノ酸Ｌ、Ｉ、ＶおよびＦのそれぞれをＱ、ＴおよびＹで（それぞれ）置換して、野生型（上側の線、配列番号３２４）に整列させた以下の配列（下側の線、配列番号３２５）を得る。

Example 1 was repeated for the title protein. The following sequence aligned to the wild type (upper line, SEQ ID NO: 324) with each of the hydrophobic amino acids L, I, V and F in its transmembrane domain replaced by Q, T and Y (respectively). (lower line, SEQ ID NO: 325) is obtained.

予測される膜貫通領域は本発明の修飾されたドメインを例示しており、以下を（配列番号３２６、３２７、３２８、３２９、３３０、３３１、３３２、３３３をそれぞれ）含む。

Predicted transmembrane regions exemplify the modified domains of the invention and include the following (SEQ ID NO: 326, 327, 328, 329, 330, 331, 332, 333, respectively).

従って、例えば本発明は、それぞれ修飾されたドメインすなわち「ｍｔ」ドメインを含む膜貫通ドメインを含む。好ましくは本明細書中のＴＭ１を含むタンパク質は、細胞外および細胞内ループ配列（下線が引かれていない配列）の１つ以上（例えば全て）を含む。追加または代わりとして、本明細書中のＴＭ１を含むタンパク質は、描写されているタンパク質または、野生型配列に記載されている天然Ｖ、Ｌ、ＩおよびＦアミノ酸のうちの１つ、２つ、３つまたは場合により４つまたはそれ以上を保持している相同な配列内の１つ以上のさらなる膜貫通領域（下線が引かれている配列）を含む。 Thus, for example, the invention includes transmembrane domains, each including a modified or "mt" domain. Preferably, the TM1-comprising proteins herein include one or more (eg, all) of extracellular and intracellular loop sequences (sequences not underlined). Additionally or alternatively, proteins comprising TM1 herein include one, two, three of the naturally occurring V, L, I and F amino acids as described in the depicted protein or in the wild type sequence. one or more additional transmembrane regions (underlined sequences) in homologous sequences retaining one or optionally four or more.

実施例２３：ＱＴＹ変異体およびＣＸＣＲ４－ＱＴＹ変異体の大腸菌発現 Example 23: E. coli expression of QTY and CXCR4-QTY mutants

１．大腸菌ＢＬ２１（ＤＥ３）におけるＣＸＣＲ４－ＱＴＹの大規模産生
日常的に使用されるＬＢ培地１リットル当たり約２０ｍｇの精製されたタンパク質であると推定される収率で、水溶性GPCR CXCR4を大腸菌において産生した。推定される産生コストは１ミリグラム当たり約＄０．２５である。この手法を使用してグラム量の水溶性ＧＰＣＲを容易に得ることができ、次いでこれによりそれらの構造的決定を容易にすることができると有利である。 1. Large-scale production of CXCR4-QTY in E. coli BL21(DE3) The water-soluble GPCR CXCR4 was produced in E. coli with an estimated yield of approximately 20 mg of purified protein per liter of routinely used LB medium. . Estimated production cost is approximately $0.25 per milligram. Advantageously, this approach can be used to easily obtain gram quantities of water-soluble GPCRs, which then facilitates their structural determination.

２．水溶性ＣＸＣＲ４－ＱＴＹが大腸菌細胞において産生される位置の決定
水溶性ＣＸＣＲ４－ＱＴＹをｐＥＴベクターにクローン化した。本発明らは最初に小規模の大腸菌培養研究を行って、産生されるＣＸＣＲ４－ＱＴＹタンパク質（１５０ｍｌの培養物）の位置を評価した。ＩＰＴＧを用いて２４℃で４時間誘導された細胞を培養した後、本発明らはこれらの細胞を回収および超音波処理し、１４，６３７×ｇ（１２，０００ｒｍｐ）の遠心分離により２つの画分に分けた。次いで、本発明らは特異的抗ｒｈｏタグモノクローナル抗体のウエスタンブロット分析を使用してＣＸＣＲ４－ＱＴＹタンパク質の位置を検出した。本発明らはＣＸＣＲ４－ＱＴＹタンパク質が上澄み画分中にあり、タンパク質がペレット画分中にないことを観察し、従って、当該タンパク質が完全に水溶性であることが示唆された。 2. Determination of the location where soluble CXCR4-QTY is produced in E. coli cells Soluble CXCR4-QTY was cloned into a pET vector. We first conducted a small scale E. coli culture study to assess the location of the produced CXCR4-QTY protein (150 ml culture). After culturing cells induced with IPTG for 4 hours at 24°C, we harvested and sonicated these cells and divided them into two fractions by centrifugation at 14,637 x g (12,000 rpm). Divided into minutes. We then detected the location of CXCR4-QTY protein using Western blot analysis of a specific anti-rho tag monoclonal antibody. We observed that CXCR4-QTY protein was in the supernatant fraction and no protein was in the pellet fraction, thus suggesting that the protein was completely water soluble.

３．大腸菌細胞の可溶性画分中で産生されるＣＸＣＲ４－ＱＴＹの推定収率
次いで、本発明らは別に１５０ｍｌの培養を行い、約６ｍｇの１Ｄ４モノクローナル抗体で精製されたＣＸＣＲ４－ＱＴＹを得た。本発明らは、その収率を過小に推定したため（本発明らは驚くべき程に高い収率を予期していていなかった）、本発明らは、産生されたＣＸＣＲ４－ＱＴＹを捕捉するために十分な親和性ｒｈｏ－１Ｄ４タグモノクローナル抗体ビーズを使用しなかった。従って、精製中に十分なビーズが添加されず、当該タンパク質が流出レーン中にあり、さらに洗い流されたことにより、有意な量のＣＸＣＲ４－ＱＴＹタンパク質がビーズに結合しなかった。有意な損失にも関わらず、本発明らは、レーン８～１０（溶離画分）から分かるように、１５０ｍｌの培養物に対してなお約６ｍｇを得ることができる。 3. Estimated yield of CXCR4-QTY produced in the soluble fraction of E. coli cells Next, the present inventors separately cultured 150 ml and obtained CXCR4-QTY purified with about 6 mg of 1D4 monoclonal antibody. Because we underestimated the yield (we did not expect the surprisingly high yield), we Not sufficient affinity rho-1D4 tag monoclonal antibody beads were used. Therefore, a significant amount of CXCR4-QTY protein did not bind to the beads because not enough beads were added during purification and the protein was in the flow lane and was further washed away. Despite the significant loss, we are still able to obtain approximately 6 mg for a 150 ml culture, as can be seen from lanes 8-10 (eluted fractions).

４．精製された水溶性ＣＸＣＲ４－ＱＴＹタンパク質の熱安定性の測定
ほとんどの場合、構造によりタンパク質における機能が決まる。従って、大腸菌で産生された精製されたＣＸＣＲ４－ＱＴＹタンパク質が約５０％のαヘリックスを有する典型的なαへリックス構造になお正確に折り畳まれているか否かを知ることは重要である。本発明らは円偏光二色性（ＣＤ）を用いて二次構造測定を行った。本発明らは、各種温度で精製されたＣＸＣＲ４－ＱＴＹタンパク質のＣＤスペクトルを観察した。本発明らは、精製されたＣＸＣＲ４－ＱＴＹタンパク質の熱安定性を測定した。本発明らは、精製されたＣＸＣＲ４－ＱＴＹタンパク質が５５℃まで比較的安定であり、当該タンパク質が部分的にのみ徐々に変性し、ＣＤシグナル減少が約１５％であることを観察した。５５℃～６５℃で、その変性は６５℃に向かって増加し、６５℃～７５℃で変性転移が生じ、７５℃で当該タンパク質はほぼ完全に変性した。 4. Determination of Thermal Stability of Purified Water-Soluble CXCR4-QTY Protein In most cases, structure determines function in a protein. Therefore, it is important to know whether the purified CXCR4-QTY protein produced in E. coli is still correctly folded into a typical α-helical structure with approximately 50% α-helices. The present inventors performed secondary structure measurements using circular dichroism (CD). The present inventors observed CD spectra of CXCR4-QTY protein purified at various temperatures. The present inventors measured the thermal stability of purified CXCR4-QTY protein. We observed that the purified CXCR4-QTY protein was relatively stable up to 55° C., with only partial gradual denaturation of the protein and CD signal reduction of approximately 15%. From 55°C to 65°C, the denaturation increased towards 65°C, a denaturation transition occurred from 65°C to 75°C, and at 75°C the protein was almost completely denatured.

本発明らは、２２２ｎｍで楕円率に対して温度をプロットして、精製された水溶性ＣＸＣＲ４－ＱＴＹタンパク質の融解温度（Ｔｍ）を得た。このプロットから、本発明らは、精製されたＣＸＣＲ４－ＱＴＹタンパク質のＴｍは約６７℃であると推定した。このＴｍは、精製された水溶性ＣＸＣＲ４－ＱＴＹタンパク質が多くの他の可溶性タンパク質と比較して非常に安定であることを示唆している。熱安定性が良好である程、結晶格子充填が良好になり、従って構造を得る機会が増すことが知られているため、この熱安定性特性により回折結晶を得ることが容易になる。 We plotted temperature versus ellipticity at 222 nm to obtain the melting temperature (Tm) of purified water-soluble CXCR4-QTY protein. From this plot, we estimated that the Tm of purified CXCR4-QTY protein was approximately 67°C. This Tm suggests that purified water-soluble CXCR4-QTY protein is very stable compared to many other soluble proteins. This thermal stability property makes it easier to obtain diffractive crystals, since it is known that the better the thermal stability, the better the crystal lattice packing and thus the greater the chances of obtaining a structure.

５．さらなるＧタンパク質共役受容体
本発明らは、１０種のＧタンパク質共役受容体（ＧＰＣＲ）を選択して、Ｚｈａｎｇらの「Water Soluble Membrane Proteins and Methods for the Preparation and Use Thereof（水溶性膜タンパク質およびその調製および使用方法）」という発明の名称の米国特許公開第２０１２／０２５２７１９Ａ号（「Ｚｈａｎｇ」）に記載されているＱＴＹ方法を用いてその水溶性形態を設計した。あるいは、本明細書に記載されているタンパク質を選択することができる。 5. Additional G-Protein Coupled Receptors The present inventors selected ten G-protein-coupled receptors (GPCRs) and used Zhang et al.'s “Water Soluble Membrane Proteins and Methods for the Preparation and Use Thereof.” The water-soluble form was designed using the QTY method described in US Patent Publication No. 2012/0252719A ("Zhang") entitled ``Methods of Preparation and Use''. Alternatively, proteins described herein can be selected.

６．遺伝子の分子クローニング
本発明らは無細胞タンパク質発現プラスミドベクターｐＩＶｅｘ２．３ｄおよび大腸菌ｐＥＴ２８ａおよびｐＥＴ－ｄｕｅｔ－１プラスミドベクターにおけるＧＰＣＲの天然およびＱＴＹ遺伝子の確認に成功した。 6. Molecular Cloning of Genes The present inventors successfully confirmed the native and QTY genes of GPCRs in the cell-free protein expression plasmid vector pIVex2.3d and the E. coli pET28a and pET-duet-1 plasmid vectors.

７．水溶性ＧＰＣＲの産生
本発明らは、いくつかの天然およびＱＴＹタンパク質を産生した。無細胞系において天然ＧＰＣＲを産生した場合、界面活性剤Ｂｒｉｊ３５が必要であり、界面活性剤を使用しない場合、当該タンパク質は産生されるとすぐに沈殿する。他方、本発明らは、界面活性剤の存在および非存在下でＱＴＹ変異体を試験した。界面活性剤を使用しない場合、無細胞系は可溶性タンパク質を産生した。 7. Production of water-soluble GPCRs We produced several native and QTY proteins. When producing native GPCRs in a cell-free system, the detergent Brij35 is required; without detergent, the protein precipitates as soon as it is produced. On the other hand, we tested QTY variants in the presence and absence of detergent. Without detergent, the cell-free system produced soluble protein.

本発明らは、大腸菌ＢＬ２１（ＤＥ３）株における大腸菌細胞タンパク質産生のために、ＱＴＹ変異体を大腸菌生体内発現系ｐＥＴ２８ａおよびｐＥＴ－ｄｕｅｔ－１プラスミドベクターにクローン化した。本発明らは、ＣＸＣＲ４およびＣＣＲ５を含むいくつかの水溶性ＧＰＣＲタンパク質を精製し、本発明らはそれを二次構造分析のために使用した。本発明らは、ＣＸＣＲ４についてその天然リガンドＣＣＬ１２（ＳＤＦ１ａ）を用いてリガンド結合研究を行った。本発明らは、水溶性GPCR CCR5e変異体の大腸菌産生および精製を行った。ＣＣＲ５ｅ変異体は５８個のアミノ酸変化（約１８％の変化）を有していた。水溶性GPCR CCR5e変異体を、特異的モノクローナル抗体ロドプシンタグを用いて均質になるまで精製した。青色の株は、ＳＤＳゲル上にその純度を示す単一のバンドを示した。当該タンパク質のサイズマーカーから推定されるように、それは純粋なホモ二量体であるように見える（天然膜結合ＣＸＣＲ４結晶構造は二量体であった）。ウエスタンブロットにより、ＧＰＣＲにおいて一般的なＣＣＲ５ｅ変異体の単量体およびホモ二量体を確認した。 We cloned QTY mutants into the E. coli in vivo expression system pET28a and pET-duet-1 plasmid vectors for E. coli cellular protein production in the E. coli BL21(DE3) strain. We purified several water-soluble GPCR proteins, including CXCR4 and CCR5, which we used for secondary structure analysis. We performed ligand binding studies on CXCR4 using its natural ligand CCL12 (SDF1a). The present inventors produced and purified a water-soluble GPCR CCR5e mutant in E. coli. The CCR5e variant had 58 amino acid changes (approximately 18% change). Soluble GPCR CCR5e mutants were purified to homogeneity using a specific monoclonal antibody rhodopsin tag. The blue strain showed a single band on the SDS gel indicating its purity. As deduced from the protein's size markers, it appears to be a pure homodimer (the native membrane-bound CXCR4 crystal structure was dimeric). Western blot confirmed monomers and homodimers of common CCR5e variants in GPCRs.

８．QTY CCR5eの二次構造研究
本発明らは、GPCR CCR5eの水溶性ＱＴＹ変異体を得た。次いで本発明らは、Ａｖｉｖモデル４１０円偏光二色性装置を用いて二次構造分析を行い、GPCR QTY CCR5-e変異体が典型的なαヘリックス構造を有することを確認した。本発明らは、各種温度で実験を行ってＣＣＲ５ｅ変異体のＴｍすなわち水溶性ＣＣＲ５ｅ変異体の熱安定性も決定した。これらの実験から、本発明らはＣＣＲ５ｅ変異体のＴｍは約４６℃であると決定した。このＴｍは結晶スクリーニング実験にとって良好である。 8. Secondary structure study of QTY CCR5e The present inventors obtained a water-soluble QTY mutant of GPCR CCR5e. We then performed secondary structure analysis using an Aviv model 410 circular dichroism instrument and confirmed that the GPCR QTY CCR5-e variant had a typical α-helical structure. We also determined the Tm of the CCR5e variants, ie, the thermal stability of water-soluble CCR5e variants, by conducting experiments at various temperatures. From these experiments, we determined that the Tm of the CCR5e mutant was approximately 46°C. This Tm is good for crystal screening experiments.

９．ＣＣＬ１２（ＳＤＦ１ａ）を用いたＣＸＣＲ４のリガンド結合研究
設計された水溶性QTY GPCRがそれらの生物学的機能をなお維持している、すなわちそれらの天然リガンドを確実に認識して結合するようにするために、本発明らは最初にＥＬＩＳＡ測定を使用して、水溶性ＣＸＣＲ４をその天然リガンドＣＣＬ１２（ＳＤＦ１ａともいう）を用いて研究した。アッセイ濃度は５０ｎＭ～１０μＭの範囲である。測定されたＫｄは約８０ｎＭである。天然膜結合ＣＸＣＲ４のＳＤＦ１ａとのＫｄは約１００ｎＭである。そのため、水溶性ＣＸＣＲ４のＫｄは許容される範囲内である。より感受性の高いＳＰＲを用いるさらなる実験または他の測定を行って、より正確なＫｄを生成してもよい。 9. Ligand binding studies of CXCR4 using CCL12 (SDF1a) to ensure that the designed water-soluble QTY GPCRs still retain their biological function, i.e. recognize and bind their natural ligands. We first used ELISA measurements to study water-soluble CXCR4 with its natural ligand CCL12 (also referred to as SDF1a). Assay concentrations range from 50 nM to 10 μM. The measured Kd is approximately 80 nM. The Kd of native membrane-bound CXCR4 with SDF1a is approximately 100 nM. Therefore, the Kd of water-soluble CXCR4 is within an acceptable range. Further experiments using more sensitive SPR or other measurements may be performed to generate a more accurate Kd.

本発明を特にその好ましい実施形態を参照しながら図示および説明してきたが、添付の特許請求の範囲によって包含される本発明の範囲から逸脱することなくその形態および詳細における各種変更を行うことができることは当業者によって理解されるであろう。 Although the invention has been illustrated and described with particular reference to preferred embodiments thereof, it will be appreciated that various changes may be made in form and detail thereof without departing from the scope of the invention as encompassed by the appended claims. will be understood by those skilled in the art.

本出願は、以下の発明を含み得る。
（１）
Ｇタンパク質共役受容体（ＧＰＣＲ）の水溶性変異体を選択する手順を実行するためのコンピュータ実装方法であって、
分析のために前記ＧＰＣＲの配列を入力する工程と、
前記ＧＰＣＲの膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）内の複数の疎水性アミノ酸が置換されている前記ＧＰＣＲの変異体を得る工程であって、
（ａ）前記疎水性アミノ酸は、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択され、
（ｂ）前記ロイシン（Ｌ）はそれぞれ独立して、グルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換されており、
（ｃ）前記イソロイシン（Ｉ）および前記バリン（Ｖ）はそれぞれ独立してトレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換されており、かつ
（ｄ）前記フェニルアラニンはそれぞれチロシン（Ｙ）で置換されている
ことを特徴とする工程と、その後に
前記変異体のためにαヘリックス二次構造結果を得て、前記変異体内のαヘリックス二次構造の維持を確認する工程と、
前記変異体のために膜貫通領域結果を得て前記変異体の水溶性を確認する工程と
を含み、それにより前記ＧＰＣＲの水溶性変異体を選択することを特徴とする方法。
（２）
工程（３）を工程（４）の前、それと同時またはその後に行う、請求項（１）に記載の方法。
（３）
工程（２）において、前記ＧＰＣＲの１つの同じＴＭ領域内の前記複数の疎水性アミノ酸の１つのサブセットを置換して変異体候補ライブラリーの１種のメンバーを作製し、かつ前記複数の疎水性アミノ酸の１つ以上の異なるサブセットを置換して前記ライブラリーのさらなるメンバーを作製する、請求項（１）または（２）に記載の方法。
（４）
前記ライブラリー全てのメンバーを組み合わせスコアに基づいてランク付けする工程をさらに含み、前記組み合わせスコアは、前記αヘリックス二次構造予測結果および前記膜貫通領域予測結果の重み付けされた組み合わせである、請求項（３）に記載の方法。
（５）
ランク付け関数を用いて前記変異体をランク付けする工程をさらに含む、請求項（１）に記載の方法。
（６）
データプロセッサを用いて前記方法を行う工程をさらに含む、請求項（１）に記載の方法。
（７）
前記データプロセッサに接続されているメモリをさらに含む、請求項（６）に記載の方法。
（８）
前記ランク付け関数は二次構造成分および水溶性成分を含む、請求項（５）に記載の方法。
（９）
前記ランク付け関数は、前記二次構造成分および／または前記水溶性成分の重み付け値を含む、請求項（８）に記載の方法。
（１０）
最も高い組み合わせスコアを有するＮ種のメンバーを選択して前記ＴＭ領域のための変異体候補の第１のライブラリーを形成する工程をさらに含み、ここで、Ｎは所定の整数（例えば、３、４、５、６、７、８、９、１０、１１、１２、１３、１４、１５、１６、１７、１８、１９、２０またはそれ以上）である、請求項（４）に記載の方法。
（１１）
前記ＧＰＣＲの１、２、３、４、５または６つ全ての他のＴＭ領域のための変異体候補の１つのライブラリーを作製する工程をさらに含む、請求項（１０）に記載の方法。
（１２）
前記ＧＰＣＲの２つ以上のＴＭ領域を前記変異体候補ライブラリー内の対応するＴＭ領域で置換して組み合わせ変異体ライブラリーを作製する工程をさらに含む、請求項（１１）に記載の方法。
（１３）
前記ロイシンの実質的に全て（例えば全て）はグルタミンで置換される、請求項（１）～（１２）のいずれか１項に記載の方法。
（１４）
前記イソロイシンの実質的に全て（例えば全て）はトレオニンで置換される、請求項（１）～（１３）のいずれか１項に記載の方法。
（１５）
前記バリンの実質的に全て（例えば全て）はトレオニンで置換される、請求項（１）～（１４）のいずれか１項に記載の方法。
（１６）
前記フェニルアラニンの実質的に全て（例えば全て）はチロシンで置換される、請求項（１）～（１５）のいずれか１項に記載の方法。
（１７）
１つ以上（例えば、１、２または３つ）の前記ロイシンは置換されていない、請求項（１）～（１６）のいずれか１項に記載の方法。
（１８）
１つ以上（例えば、１、２または３つ）の前記イソロイシンは置換されていない、請求項（１）～（１７）のいずれか１項に記載の方法。
（１９）
１つ以上（例えば、１、２または３つ）の前記バリンは置換されていない、請求項（１）～（１８）のいずれか１項に記載の方法。
（２０）
１つ以上（例えば、１、２または３つ）の前記フェニルアラニンは置換されていない、請求項（１）～（１９）のいずれか１項に記載の方法。
（２１）
前記組み合わせ変異体を産生／発現させる工程をさらに含む、請求項（１）～（２０）のいずれか１項に記載の方法。
（２２）
前記組み合わせ変異体をリガンド結合について（例えば、酵母ツーハイブリッド法で）試験する工程をさらに含み、ここでは、ＧＰＣＲと比較して実質的に同じリガンド結合を有するものを選択する、請求項（１）～（２１）のいずれか１項に記載の方法。
（２３）
前記組み合わせ変異体を前記ＧＰＣＲの生物学的機能について試験する工程をさらに含み、ここでは、前記ＧＰＣＲと比較して実質的に同じ生物学的機能を有するものを選択する、請求項（１）～（２２）のいずれか１項に記載の方法。
（２４）
前記組み合わせ変異体ライブラリーは約２百万未満のメンバーを含む、請求項（１）～（２３）のいずれか１項に記載の方法。
（２５）
前記ＧＰＣＲの前記配列は前記ＧＰＣＲのＴＭ領域に関する情報を含む、請求項（１）～（２４）のいずれか１項に記載の方法。
（２６）
前記ＧＰＣＲの前記配列はタンパク質構造データベース（例えば、ＰＤＢ、ＵｎｉＰｒｏｔ）から得られる、請求項（１）～（２５）のいずれか１項に記載の方法。
（２７）
前記ＧＰＣＲのＴＭ領域を前記ＧＰＣＲの前記配列に基づいて予測する、請求項（１）～（２６）のいずれか１項に記載の方法。
（２８）
前記ＧＰＣＲのＴＭ領域をＴＭＨＭＭ２．０（隠れマルコフモデルを用いる膜貫通予測）ソフトウェアモジュールを用いて予測する、請求項（２７）に記載の方法。
（２９）
前記ＴＭＨＭＭ２．０ソフトウェアモジュールはピーク探索のために動的ベースラインを利用する、請求項（２８）に記載の方法。
（３０）
前記ＧＰＣＲの各変異体のポリヌクレオチド配列を提供する工程をさらに含む、請求項（１）～（２９）のいずれか１項に記載の方法。
（３１）
前記ポリヌクレオチド配列は、宿主（例えば、大腸菌などの細菌、出芽酵母または分裂酵母などの酵母、Ｓｆ９細胞などの昆虫細胞、非ヒト哺乳類細胞またはヒト細胞）における発現のために最適化されたコドンである、請求項（３０）に記載の方法。
（３２）
本スクリプト化手順はＶＢＡスクリプトを含む、請求項（１）～（３１）のいずれか１項に記載の方法。
（３３）
本スクリプト化手順は、Ｌｉｎｕｘ（登録商標）システム（例えば、Ubuntu 12.04 LTS）、Ｕｎｉｘ（登録商標）システム、Microsoft Windowsオペレーティングシステム、ＡｎｄｒｏｉｄオペレーティングシステムまたはApple iOSオペレーティングシステムにより動作可能である、請求項（１）～（３２）のいずれか１項に記載の方法。
（３４）
ＧＰＣＲの膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）内の複数の疎水性アミノ酸が置換されているＧタンパク質共役受容体（ＧＰＣＲ）の水溶性変異体であって、
（ａ）前記疎水性アミノ酸は、ロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択され、
（ｂ）前記ロイシン（Ｌ）はそれぞれ独立して、グルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、
（ｃ）前記イソロイシン（Ｉ）および前記バリン（Ｖ）はそれぞれ独立してトレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換され、かつ
（ｄ）前記フェニルアラニンはそれぞれチロシン（Ｙ）で置換され、その後に
前記変異体の７つ全てのＴＭ領域によりαヘリックスの二次構造が維持されており、かつ
予測される膜貫通領域が存在しない
ことを特徴とする水溶性変異体。
（３５）
配列番号４～１１、１３～２０、２２～２９、３１～３８、４０～４７、４９～５６および５８～６４からなる群から選択される１つ以上のアミノ酸配列を含む、請求項（３４）に記載の水溶性変異体。
（３６）
配列番号３、１２、２１、３０、３９、４８および５７からなる群から選択される１つ以上のアミノ酸配列をさらに含む、請求項（３５）に記載の水溶性変異体。
（３７）
ＣＸＣＲ４リガンドに結合する、請求項（３５）または（３６）に記載の水溶性変異体。
（３８）
配列番号６９～７６、７８～８５、８７、８９～９６、９８～１０５、１０７～１１４および１１６～１２３からなる群から選択される１つ以上のアミノ酸配列を含む、請求項（３４）に記載の水溶性変異体。
（３９）
配列番号６８、７７、８６、８８、９７、１０６、１１５および１２４からなる群から選択される１つ以上のアミノ酸配列をさらに含む、請求項（３８）に記載の水溶性変異体。
（４０）
ＣＸ３ＣＲ１リガンドに結合する、請求項（３８）または（４０）に記載の水溶性変異体。
（４１）
配列番号１２８～１３５、１３７～１４４、１４６～１５３、１５５～１６２、１６４～１７１、１７３および１７５～１８２からなる群から選択される１つ以上のアミノ酸配列を含む、請求項（３４）に記載の水溶性変異体。
（４２）
配列番号１２７、１３６、１４５、１５４、１６３、１７２、１７４および１８３からなる群から選択される１つ以上のアミノ酸配列をさらに含む、請求項（４１）に記載の水溶性変異体。
（４３）
ＣＣＲ３リガンドに結合する、請求項（４１）または（４２）に記載の水溶性変異体。
（４４）
配列番号１８７～１９４、１９６～２０３、２０５～２０６、２０８、２１０～２１７、２１９～２２５、２２７～２３４からなる群から選択される１つ以上のアミノ酸配列を含む、請求項（３４）に記載の水溶性変異体。
（４５）
配列番号１８６、１９５、２０４、２０７、２０９、２１８、２２６および２３５からなる群から選択される１つ以上のアミノ酸配列をさらに含む、請求項（４４）に記載の水溶性変異体。
（４６）
ＣＣＲ５リガンドに結合する、請求項（４４）または（４５）に記載の水溶性変異体。
（４７）
配列番号２３６～２４３、２４５～２５２、２５４～２６１、２６３～２７０、２７２、２７４～２８１および２８３～２９０からなる群から選択される１つ以上のアミノ酸配列を含む、請求項（３４）に記載の水溶性変異体。
（４８）
配列番号２３５、２４４、２５３、２６２、２７１、２７３、２８２および２９１からなる群から選択される１つ以上のアミノ酸配列をさらに含む、請求項（４７）に記載の水溶性変異体。
（４９）
ＣＸＣＲ３リガンドに結合する、請求項（４７）または（４８）に記載の水溶性変異体。
（５０）
配列番号２、６７、１２６、１８５、３２７、２９３、２９５、２９７、２９９、３０１、３０３、３０５、３０７、３０９、３１１、３１３、３１５、３１７、３１９、３２１、３２３または３２５のいずれか１つに記載されている１つ以上の膜貫通ドメインを含む、請求項（３４）に記載の水溶性変異体。
（５１）
前記水溶性変異体は水溶性であり、かつ相同な天然膜貫通タンパク質のリガンドに結合する、請求項（５０）に記載の水溶性変異体。
（５２）
（ａ）タンパク質産生に適した条件下で増殖培地において細菌を培養する工程と、
（ｂ）前記細菌の溶解物を画分に分けて可溶性画分および不溶性ペレット画分を生成する工程と、
（ｃ）前記タンパク質を前記可溶性画分から単離する工程と
を含み、
（１）前記タンパク質は請求項２９～４６のいずれか１項に記載のＧタンパク質共役受容体（ＧＰＣＲ）の変異体であり、
（２）前記タンパク質の収率は増殖培地の少なくとも２０ｍｇ／Ｌ（例えば、３０ｍｇ／Ｌ、４０ｍｇ／Ｌ、５０ｍｇ／Ｌまたはそれ以上）である
ことを特徴とする、細菌（例えば大腸菌）においてタンパク質を産生する方法。
（５３）
前記細菌は大腸菌ＢＬ２１であり、かつ前記増殖培地はＬＢ媒体である、請求項（４７）に記載の方法。
（５４）
前記タンパク質は前記細菌内のプラスミドによってコードされる、請求項（４７）または（４８）に記載の方法。
（５５）
前記タンパク質の発現は誘導プロモーターの制御下にある、請求項（４７）～（４９）のいずれか１項に記載の方法。
（５６）
前記誘導プロモーターはＩＰＴＧによって誘導可能である、請求項（５０）に記載の方法。
（５７）
前記溶解物を超音波処理によって生成する、請求項（４７）～（５１）のいずれか１項に記載の方法。
（５８）
前記溶解物を１４，５００×ｇ以上で遠心分離して前記可溶性画分を生成する、請求項（４７）～（５２）のいずれか１項に記載の方法。
（５９）
請求項（１）～（３３）のいずれかに記載の方法を行うための一連の命令が記憶された非一時的コンピュータ可読媒体。
（６０）
膜タンパク質の水溶性変異体を選択するように動作するデータ処理システムであって、アミノ酸の置換を実行するように動作するデータプロセッサを備え、二次構造成分および水溶性成分を含むランク付け関数によりタンパク質変異体をランク付けするシステム。
（６１）
前記システムによる処理のための膜タンパク質のライブラリーをさらに含む、請求項（６０）に記載のシステム。
（６２）
置換プロセッサを実行するためのコード化された命令を記憶する前記データプロセッサに接続されたメモリをさらに備える、請求項（６０）に記載のシステム。
（６３）
請求項１に記載の方法の工程（ａ）、（ｂ）、（ｃ）および（ｄ）に対して動作する、請求項（６０）に記載のシステム。
（６４）
前記二次構造成分に基づく重み付けされた組み合わせであるランク付け関数をさらに含む、請求項（６０）に記載のシステム。
（６５）
ネットワークを介して外部プログラムと通信する、請求項（６０）に記載のシステム。
（６６）
水溶性変異体を記憶するためのデータベースをさらに含む、請求項（６０）に記載のシステム。
（６７）
動的ベースライン処理を行うための命令をさらに含む、請求項（６０）に記載のシステム。
（６８）
方法のパラメータを選択するためのインタフェースをさらに含む、請求項（６０）に記載のシステム。
（６９）
請求項３５～５０に記載されている配列を入力する工程をさらに含む、請求項（６０）に記載のシステム。
（７０）
水溶性変異体を選択するための手順を実行するためのコンピュータ実装方法であって、
データ処理を行って分析のために膜タンパク質の配列を同定する工程と、
前記膜タンパク質の前記膜貫通（ＴＭ）ドメインαヘリックスセグメント（「ＴＭ領域」）の複数の疎水性アミノ酸が置換されている前記膜タンパク質の変異体を得る工程と
を含み、
当該データプロセッサは、
前記変異体のαヘリックスの二次構造結果を決定して前記変異体におけるαヘリックスの二次構造の維持を確認し、
前記変異体の膜貫通領域結果を決定して前記変異体の水溶性を確認し、かつ
前記膜タンパク質の水溶性変異体を選択する
ことを特徴とする方法。
（７１）
前記置換は、
（ａ）疎水性アミノ酸をロイシン（Ｌ）、イソロイシン（Ｉ）、バリン（Ｖ）およびフェニルアラニン（Ｆ）からなる群から選択すること、
（ｂ）ロイシン（Ｌ）をそれぞれ独立してグルタミン（Ｑ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換すること、
（ｃ）イソロイシン（Ｉ）および前記バリン（Ｖ）をそれぞれ独立してトレオニン（Ｔ）、アスパラギン（Ｎ）またはセリン（Ｓ）で置換すること、および
（ｄ）前記フェニルアラニンをそれぞれチロシン（Ｙ）で置換すること
を含む、請求項（７０）に記載の方法。
（７２）
ＧＰＣＲの１つの同じＴＭ領域の前記複数の疎水性アミノ酸の１つのサブセットを置換して変異体候補ライブラリーの１種のメンバーを作製し、かつ前記複数の疎水性アミノ酸の１つ以上の異なるサブセットを置換して前記ライブラリーのさらなるメンバーを作製する、請求項（７１）に記載の方法。
（７３）
前記ライブラリー全てのメンバーを組み合わせスコアに基づいてランク付けする工程をさらに含み、前記組み合わせスコアは、前記αヘリックス二次構造予測結果および前記膜貫通領域予測結果の重み付けされた組み合わせである、請求項（７０）に記載の方法。
（７４）
ランク付け関数を用いて前記変異体をランク付けする工程をさらに含む、請求項（７０）に記載の方法。
（７５）
前記データプロセッサに接続されたメモリをさらに備える、請求項（７０）に記載の方法。
（７６）
前記ランク付け関数は二次構造成分および水溶性成分を含む、請求項（７４）に記載の方法。
（７７）
前記ランク付け関数は前記二次構造成分および／または前記水溶性成分の重み付け値を含む、請求項（７６）に記載の方法。
（７８）
最も高い組み合わせスコアを有するＮ種のメンバーを選択して前記ＴＭ領域のための変異体候補の第１のライブラリーを形成する工程をさらに含み、ここで、Ｎは所定の整数（例えば、３、４、５、６、７、８、９、１０、１１、１２、１３、１４、１５、１６、１７、１８、１９、２０またはそれ以上）である、請求項（７３）に記載の方法。
（７９）
ＧＰＣＲの１、２、３、４、５または６つ全ての他のＴＭ領域のために変異体候補の１つのライブラリーを作製する工程をさらに含む、請求項（７８）に記載の方法。
（８０）
前記ＧＰＣＲの２つ以上のＴＭ領域を前記変異体候補ライブラリー内の対応するＴＭ領域で置換して組み合わせ変異体ライブラリーを作製する工程をさらに含む、請求項（７９）に記載の方法。
This application may include the following inventions.
(1)
A computer-implemented method for performing a procedure for selecting water-soluble variants of a G protein-coupled receptor (GPCR), the method comprising:
inputting the sequence of the GPCR for analysis;
Obtaining a variant of the GPCR in which a plurality of hydrophobic amino acids within the transmembrane (TM) domain α-helical segment (“TM region”) of the GPCR are substituted, the step comprising:
(a) the hydrophobic amino acid is selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) the leucine (L) is each independently substituted with glutamine (Q), asparagine (N) or serine (S);
(c) the isoleucine (I) and the valine (V) are each independently substituted with threonine (T), asparagine (N), or serine (S), and (d) the phenylalanine is each substituted with tyrosine (Y ), followed by obtaining α-helical secondary structure results for said mutant to confirm maintenance of α-helical secondary structure within said mutant;
obtaining transmembrane region results for said mutant to confirm water solubility of said mutant, thereby selecting water-soluble mutants of said GPCR.
(2)
The method according to claim (1), wherein step (3) is performed before, simultaneously with, or after step (4).
(3)
in step (2), substituting a subset of the plurality of hydrophobic amino acids in one and the same TM region of the GPCR to create a member of a variant candidate library, and 2. The method of claim 1 or 2, wherein one or more different subsets of amino acids are substituted to create further members of the library.
(4)
10. The method of claim 1, further comprising the step of ranking all members of the library based on a combined score, the combined score being a weighted combination of the α-helix secondary structure prediction result and the transmembrane region prediction result. The method described in (3).
(5)
2. The method of claim 1, further comprising ranking the variants using a ranking function.
(6)
The method of claim 1, further comprising performing the method using a data processor.
(7)
7. The method of claim 6, further comprising a memory coupled to the data processor.
(8)
6. The method of claim 5, wherein the ranking function includes secondary structure components and water-soluble components.
(9)
9. The method of claim 8, wherein the ranking function includes weighting values for the secondary structure component and/or the water-soluble component.
(10)
further comprising selecting N members with the highest combination scores to form a first library of variant candidates for the TM region, where N is a predetermined integer (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more).
(11)
11. The method of claim (10), further comprising generating one library of variant candidates for one, two, three, four, five or all six other TM regions of the GPCR.
(12)
12. The method of claim 11, further comprising replacing two or more TM regions of the GPCR with corresponding TM regions in the mutant candidate library to create a combinatorial mutant library.
(13)
13. The method of any one of claims (1) to (12), wherein substantially all (eg all) of the leucine is replaced with glutamine.
(14)
14. The method of any one of claims (1) to (13), wherein substantially all (eg all) of the isoleucine is replaced with threonine.
(15)
15. The method of any one of claims (1) to (14), wherein substantially all (eg all) of the valine is replaced with threonine.
(16)
16. The method of any one of claims (1) to (15), wherein substantially all (eg all) of the phenylalanines are substituted with tyrosine.
(17)
17. The method of any one of claims (1) to (16), wherein one or more (eg 1, 2 or 3) of the leucines are unsubstituted.
(18)
18. The method of any one of claims (1) to (17), wherein one or more (eg 1, 2 or 3) of the isoleucines are unsubstituted.
(19)
19. The method of any one of claims (1) to (18), wherein one or more (eg 1, 2 or 3) of the valines are unsubstituted.
(20)
The method of any one of claims (1) to (19), wherein one or more (eg 1, 2 or 3) of the phenylalanines are unsubstituted.
(21)
The method according to any one of claims (1) to (20), further comprising the step of producing/expressing the combination variant.
(22)
Claim 1 further comprising the step of testing the combination variants for ligand binding (e.g., in a yeast two-hybrid method), selecting those that have substantially the same ligand binding compared to a GPCR. The method according to any one of (21) to (21).
(23)
Claims 1 to 3 further comprising the step of testing said combination variants for the biological function of said GPCR, selecting those having substantially the same biological function as compared to said GPCR. The method according to any one of (22).
(24)
24. The method of any one of claims (1)-(23), wherein the combinatorial variant library contains less than about 2 million members.
(25)
The method according to any one of claims (1) to (24), wherein the sequence of the GPCR includes information regarding the TM region of the GPCR.
(26)
The method of any one of claims (1) to (25), wherein the sequence of the GPCR is obtained from a protein structure database (eg PDB, UniProt).
(27)
The method according to any one of claims (1) to (26), wherein the TM region of the GPCR is predicted based on the sequence of the GPCR.
(28)
28. The method of claim 27, wherein the TM region of the GPCR is predicted using a TMHMM2.0 (Transmembrane Prediction Using Hidden Markov Models) software module.
(29)
29. The method of claim 28, wherein the TMHMM 2.0 software module utilizes a dynamic baseline for peak search.
(30)
The method of any one of claims (1) to (29), further comprising the step of providing a polynucleotide sequence of each variant of said GPCR.
(31)
The polynucleotide sequence is codon-optimized for expression in a host (e.g., a bacteria such as E. coli, a yeast such as Saccharomyces cerevisiae or fission yeast, an insect cell such as Sf9 cells, a non-human mammalian cell or a human cell). 31. The method of claim 30.
(32)
32. The method of any one of claims 1 to 31, wherein the scripting procedure comprises a VBA script.
(33)
The scripted procedure is operable by a Linux system (e.g. Ubuntu 12.04 LTS), a Unix system, a Microsoft Windows operating system, an Android operating system or an Apple iOS operating system. ) to (32).
(34)
A water-soluble variant of a G protein-coupled receptor (GPCR) in which multiple hydrophobic amino acids in the transmembrane (TM) domain α-helical segment (“TM region”) of the GPCR are substituted, comprising:
(a) the hydrophobic amino acid is selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) the leucine (L) is each independently substituted with glutamine (Q), asparagine (N) or serine (S),
(c) said isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S), and (d) said phenylalanine is each replaced with tyrosine (Y). A water-soluble mutant characterized in that the α-helical secondary structure is maintained by all seven TM regions of the mutant, and there is no predicted transmembrane region.
(35)
Claim (34) comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 4-11, 13-20, 22-29, 31-38, 40-47, 49-56 and 58-64. water-soluble variants described in .
(36)
The water-soluble variant according to claim (35), further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 3, 12, 21, 30, 39, 48 and 57.
(37)
The water-soluble variant according to claim (35) or (36), which binds to a CXCR4 ligand.
(38)
Claim (34), comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 69-76, 78-85, 87, 89-96, 98-105, 107-114 and 116-123. water-soluble variants of.
(39)
The water-soluble variant according to claim (38), further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 68, 77, 86, 88, 97, 106, 115 and 124.
(40)
The water-soluble variant according to claim (38) or (40), which binds to a CX3CR1 ligand.
(41)
Claim (34), comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 128-135, 137-144, 146-153, 155-162, 164-171, 173 and 175-182. water-soluble variants of.
(42)
The water-soluble variant according to claim (41), further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 127, 136, 145, 154, 163, 172, 174 and 183.
(43)
The water-soluble variant according to claim (41) or (42), which binds to a CCR3 ligand.
(44)
Claim (34), comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 187-194, 196-203, 205-206, 208, 210-217, 219-225, 227-234. water-soluble variants of.
(45)
The water-soluble variant according to claim (44), further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 186, 195, 204, 207, 209, 218, 226 and 235.
(46)
The water-soluble variant according to claim (44) or (45), which binds to a CCR5 ligand.
(47)
Claim (34), comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 236-243, 245-252, 254-261, 263-270, 272, 274-281 and 283-290. water-soluble variants of.
(48)
The water-soluble variant according to claim (47), further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 235, 244, 253, 262, 271, 273, 282 and 291.
(49)
The water-soluble variant according to claim (47) or (48), which binds to a CXCR3 ligand.
(50)
Any one of SEQ ID NO: 2, 67, 126, 185, 327, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323 or 325 35. A water-soluble variant according to claim (34), comprising one or more transmembrane domains as described in .
(51)
51. The water-soluble variant according to claim 50, wherein the water-soluble variant is water-soluble and binds to a ligand of a homologous natural transmembrane protein.
(52)
(a) culturing the bacteria in a growth medium under conditions suitable for protein production;
(b) dividing the bacterial lysate into fractions to produce a soluble fraction and an insoluble pellet fraction;
(c) isolating the protein from the soluble fraction;
(1) The protein is a variant of the G protein-coupled receptor (GPCR) according to any one of claims 29 to 46,
(2) producing a protein in a bacterium (e.g., E. coli), wherein the yield of the protein is at least 20 mg/L (e.g., 30 mg/L, 40 mg/L, 50 mg/L or more) of the growth medium; How to produce.
(53)
48. The method of claim 47, wherein the bacterium is E. coli BL21 and the growth medium is LB medium.
(54)
49. The method of claim 47 or 48, wherein the protein is encoded by a plasmid within the bacterium.
(55)
The method according to any one of claims (47) to (49), wherein the expression of the protein is under the control of an inducible promoter.
(56)
51. The method of claim 50, wherein the inducible promoter is inducible by IPTG.
(57)
The method according to any one of claims (47) to (51), wherein the lysate is produced by ultrasonication.
(58)
The method of any one of claims (47) to (52), wherein the lysate is centrifuged at 14,500 x g or higher to generate the soluble fraction.
(59)
A non-transitory computer-readable medium having stored thereon a set of instructions for performing the method according to any of claims (1) to (33).
(60)
A data processing system operative to select a water-soluble variant of a membrane protein, comprising a data processor operative to perform amino acid substitutions, the system comprising: a ranking function comprising a secondary structure component and a water-soluble component; A system for ranking protein variants.
(61)
61. The system of claim (60), further comprising a library of membrane proteins for processing by the system.
(62)
61. The system of claim 60, further comprising a memory coupled to the data processor for storing coded instructions for executing a replacement processor.
(63)
A system according to claim (60), operative for steps (a), (b), (c) and (d) of the method according to claim 1.
(64)
61. The system of claim (60), further comprising a ranking function that is a weighted combination based on the secondary structure components.
(65)
61. The system of claim 60, communicating with an external program via a network.
(66)
61. The system of claim (60), further comprising a database for storing water-soluble variants.
(67)
61. The system of claim 60, further comprising instructions for performing dynamic baseline processing.
(68)
61. The system of claim 60, further comprising an interface for selecting parameters of the method.
(69)
System according to claim (60), further comprising the step of inputting the sequences as described in claims 35-50.
(70)
A computer-implemented method for performing steps for selecting water-soluble variants, the method comprising:
processing the data to identify membrane protein sequences for analysis;
obtaining a variant of the membrane protein in which a plurality of hydrophobic amino acids in the transmembrane (TM) domain α-helical segment (“TM region”) of the membrane protein are substituted;
The data processor is
determining α-helix secondary structure results of the mutant to confirm maintenance of α-helix secondary structure in the mutant;
A method characterized in that the transmembrane region results of the mutant are determined to confirm the water solubility of the mutant, and a water-soluble mutant of the membrane protein is selected.
(71)
The said substitution is
(a) selecting a hydrophobic amino acid from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) substituting leucine (L) with glutamine (Q), asparagine (N) or serine (S), each independently;
(c) isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S); and (d) said phenylalanine is each replaced with tyrosine (Y). 71. The method of claim (70), comprising substituting.
(72)
substituting one subset of said plurality of hydrophobic amino acids in one and the same TM region of a GPCR to create a member of a variant candidate library, and one or more different subsets of said plurality of hydrophobic amino acids; 72. The method of claim 71, wherein further members of the library are created by substituting .
(73)
10. The method of claim 1, further comprising the step of ranking all members of the library based on a combined score, the combined score being a weighted combination of the α-helix secondary structure prediction result and the transmembrane region prediction result. (70).
(74)
71. The method of claim 70, further comprising ranking the variants using a ranking function.
(75)
71. The method of claim 70, further comprising a memory coupled to the data processor.
(76)
75. The method of claim 74, wherein the ranking function includes secondary structure components and water soluble components.
(77)
77. The method of claim 76, wherein the ranking function includes weighting values for the secondary structure components and/or the water-soluble components.
(78)
further comprising selecting N members with the highest combination scores to form a first library of variant candidates for the TM region, where N is a predetermined integer (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more).
(79)
79. The method of claim (78), further comprising generating a library of variant candidates for 1, 2, 3, 4, 5 or all 6 other TM regions of the GPCR.
(80)
80. The method of claim 79, further comprising replacing two or more TM regions of the GPCR with corresponding TM regions in the variant candidate library to create a combinatorial variant library.

Claims

A computer-implemented method for performing a procedure for selecting water-soluble variants of a G protein-coupled receptor (GPCR), the method comprising:
inputting the sequence of the GPCR for analysis;
Obtaining a variant of the GPCR in which a plurality of hydrophobic amino acids within the transmembrane (TM) domain α-helical segment (“TM region”) of the GPCR are substituted, the step comprising:
(a) the hydrophobic amino acid is selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) the leucine (L) is each independently substituted with glutamine (Q), asparagine (N) or serine (S);
(c) the isoleucine (I) and the valine (V) are each independently substituted with threonine (T), asparagine (N), or serine (S), and (d) the phenylalanine is each substituted with tyrosine (Y ), followed by obtaining α-helical secondary structure results for said mutant to confirm maintenance of α-helical secondary structure within said mutant;
obtaining transmembrane region results for said mutant to confirm water solubility of said mutant, thereby selecting water-soluble mutants of said GPCR.

2. The method of claim 1, wherein step (3) is performed before, simultaneously with, or after step (4).

in step (2), substituting a subset of the plurality of hydrophobic amino acids in one and the same TM region of the GPCR to create a member of a variant candidate library, and 3. The method of claim 1 or 2, wherein one or more different subsets of amino acids are substituted to create further members of the library.

10. The method of claim 1, further comprising the step of ranking all members of the library based on a combined score, the combined score being a weighted combination of the α-helix secondary structure prediction result and the transmembrane region prediction result. The method described in 3.

2. The method of claim 1, further comprising ranking the variants using a ranking function.

The method of claim 1, further comprising performing the method using a data processor.

7. The method of claim 6, further comprising a memory coupled to the data processor.

6. The method of claim 5, wherein the ranking function includes secondary structure components and water soluble components.

9. The method of claim 8, wherein the ranking function includes weighting values for the secondary structure components and/or the water-soluble components.

further comprising selecting N members with the highest combination scores to form a first library of variant candidates for the TM region, where N is a predetermined integer (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more).

11. The method of claim 10, further comprising generating one library of candidate variants for one, two, three, four, five or all six other TM regions of the GPCR.

12. The method of claim 11, further comprising replacing two or more TM regions of the GPCR with corresponding TM regions in the variant candidate library to create a combinatorial variant library.

13. The method of any one of claims 1-12, wherein substantially all (eg all) of the leucine is replaced with glutamine.

14. The method of any one of claims 1-13, wherein substantially all (eg all) of the isoleucine is replaced with threonine.

15. The method of any one of claims 1-14, wherein substantially all (eg all) of the valine is replaced with threonine.

16. The method of any one of claims 1-15, wherein substantially all (eg all) of the phenylalanines are substituted with tyrosine.

17. The method of any one of claims 1-16, wherein one or more (eg 1, 2 or 3) of the leucines are unsubstituted.

18. The method of any one of claims 1-17, wherein one or more (eg 1, 2 or 3) of the isoleucines are unsubstituted.

19. The method of any one of claims 1-18, wherein one or more (eg 1, 2 or 3) of the valines are unsubstituted.

20. The method of any one of claims 1-19, wherein one or more (eg 1, 2 or 3) of the phenylalanines are unsubstituted.

21. The method of any one of claims 1 to 20, further comprising the step of producing/expressing the combination variant.

Claims 1-21 further comprising the step of testing said combination variants for ligand binding (e.g., in a yeast two-hybrid method), selecting those having substantially the same ligand binding compared to a GPCR. The method according to any one of the above.

23. The method of claims 1-22 further comprising testing the combination variants for the biological function of the GPCR, selecting those having substantially the same biological function as compared to the GPCR. The method described in any one of the above.

24. The method of any one of claims 1-23, wherein the combinatorial variant library contains less than about 2 million members.

25. The method according to any one of claims 1 to 24, wherein the sequence of the GPCR comprises information regarding the TM region of the GPCR.

26. The method according to any one of claims 1 to 25, wherein the sequence of the GPCR is obtained from a protein structure database (eg PDB, UniProt).

27. The method according to any one of claims 1 to 26, wherein the TM region of the GPCR is predicted based on the sequence of the GPCR.

28. The method of claim 27, wherein the TM region of the GPCR is predicted using a TMHMM2.0 (Transmembrane Prediction Using Hidden Markov Models) software module.

29. The method of claim 28, wherein the TMHMM 2.0 software module utilizes a dynamic baseline for peak search.

30. The method of any one of claims 1 to 29, further comprising the step of providing a polynucleotide sequence of each variant of said GPCR.

The polynucleotide sequence is codon-optimized for expression in a host (e.g., a bacteria such as E. coli, a yeast such as Saccharomyces cerevisiae or fission yeast, an insect cell such as Sf9 cells, a non-human mammalian cell or a human cell). 31. The method of claim 30, wherein:

32. A method according to any preceding claim, wherein the scripting procedure comprises a VBA script.

The scripted procedure is operable by a Linux system (e.g. Ubuntu 12.04 LTS), a Unix system, a Microsoft Windows operating system, an Android operating system or an Apple iOS operating system. 33. The method according to any one of 32.

A water-soluble variant of a G protein-coupled receptor (GPCR) in which multiple hydrophobic amino acids in the transmembrane (TM) domain α-helical segment (“TM region”) of the GPCR are substituted, comprising:
(a) the hydrophobic amino acid is selected from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) the leucine (L) is each independently substituted with glutamine (Q), asparagine (N) or serine (S),
(c) said isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S), and (d) said phenylalanine is each replaced with tyrosine (Y). A water-soluble mutant characterized in that the α-helical secondary structure is maintained by all seven TM regions of the mutant, and there is no predicted transmembrane region.

35. Comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 4-11, 13-20, 22-29, 31-38, 40-47, 49-56 and 58-64. water-soluble variants of.

36. The water-soluble variant of claim 35, further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 3, 12, 21, 30, 39, 48 and 57.

37. A water-soluble variant according to claim 35 or 36, which binds to CXCR4 ligand.

The aqueous solution according to claim 34, comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 69-76, 78-85, 87, 89-96, 98-105, 107-114 and 116-123. Sex mutant.

39. The water-soluble variant of claim 38, further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 68, 77, 86, 88, 97, 106, 115 and 124.

41. A water-soluble variant according to claim 38 or 40, which binds to CX3CR1 ligand.

35. The aqueous solution according to claim 34, comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 128-135, 137-144, 146-153, 155-162, 164-171, 173 and 175-182. Sex mutant.

42. The water-soluble variant of claim 41, further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NO: 127, 136, 145, 154, 163, 172, 174 and 183.

43. A water-soluble variant according to claim 41 or 42, which binds to a CCR3 ligand.

The aqueous solution according to claim 34, comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 187-194, 196-203, 205-206, 208, 210-217, 219-225, 227-234. Sex mutant.

45. The water-soluble variant of claim 44, further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NO: 186, 195, 204, 207, 209, 218, 226 and 235.

46. A water-soluble variant according to claim 44 or 45, which binds to CCR5 ligand.

35. The aqueous solution according to claim 34, comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 236-243, 245-252, 254-261, 263-270, 272, 274-281 and 283-290. Sex mutant.

48. The water-soluble variant of claim 47, further comprising one or more amino acid sequences selected from the group consisting of SEQ ID NOs: 235, 244, 253, 262, 271, 273, 282 and 291.

49. A water-soluble variant according to claim 47 or 48, which binds to CXCR3 ligand.

Any one of SEQ ID NO: 2, 67, 126, 185, 327, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323 or 325 35. The water-soluble variant of claim 34, comprising one or more transmembrane domains as described in .

51. The water-soluble variant of claim 50, wherein the water-soluble variant is water-soluble and binds to a homologous natural transmembrane protein ligand.

(a) culturing the bacteria in a growth medium under conditions suitable for protein production;
(b) dividing the bacterial lysate into fractions to produce a soluble fraction and an insoluble pellet fraction;
(c) isolating the protein from the soluble fraction;
(1) The protein is a variant of the G protein-coupled receptor (GPCR) according to any one of claims 29 to 46,
(2) producing a protein in a bacterium (e.g., E. coli), wherein the yield of the protein is at least 20 mg/L (e.g., 30 mg/L, 40 mg/L, 50 mg/L or more) of the growth medium; How to produce.

48. The method of claim 47, wherein the bacterium is E. coli BL21 and the growth medium is LB medium.

49. The method of claim 47 or 48, wherein the protein is encoded by a plasmid within the bacterium.

50. The method of any one of claims 47-49, wherein expression of the protein is under the control of an inducible promoter.

51. The method of claim 50, wherein the inducible promoter is inducible by IPTG.

52. A method according to any one of claims 47 to 51, wherein the lysate is produced by sonication.

53. The method of any one of claims 47 to 52, wherein the lysate is centrifuged at 14,500 xg or higher to generate the soluble fraction.

A non-transitory computer-readable medium having stored thereon a set of instructions for performing the method of any of claims 1-33.

A data processing system operative to select a water-soluble variant of a membrane protein, comprising a data processor operative to perform amino acid substitutions, the system comprising: a ranking function comprising a secondary structure component and a water-soluble component; A system for ranking protein variants.

61. The system of claim 60, further comprising a library of membrane proteins for processing by the system.

61. The system of claim 60, further comprising a memory coupled to the data processor that stores coded instructions for executing a replacement processor.

61. The system of claim 60, operative for steps (a), (b), (c) and (d) of the method of claim 1.

61. The system of claim 60, further comprising a ranking function that is a weighted combination based on the secondary structure components.

61. The system of claim 60, communicating with an external program via a network.

61. The system of claim 60, further comprising a database for storing water soluble variants.

61. The system of claim 60, further comprising instructions for performing dynamic baseline processing.

61. The system of claim 60, further comprising an interface for selecting parameters of the method.

61. The system of claim 60, further comprising inputting a sequence as described in claims 35-50.

A computer-implemented method for performing steps for selecting water-soluble variants, the method comprising:
processing the data to identify membrane protein sequences for analysis;
obtaining a variant of the membrane protein in which a plurality of hydrophobic amino acids in the transmembrane (TM) domain α-helical segment (“TM region”) of the membrane protein are substituted;
The data processor is
determining α-helix secondary structure results of the mutant to confirm maintenance of α-helix secondary structure in the mutant;
A method characterized in that the transmembrane region results of the mutant are determined to confirm the water solubility of the mutant, and a water-soluble mutant of the membrane protein is selected.

The said substitution is
(a) selecting a hydrophobic amino acid from the group consisting of leucine (L), isoleucine (I), valine (V) and phenylalanine (F);
(b) substituting leucine (L) with glutamine (Q), asparagine (N) or serine (S), each independently;
(c) isoleucine (I) and said valine (V) are each independently substituted with threonine (T), asparagine (N) or serine (S); and (d) said phenylalanine is each replaced with tyrosine (Y). 71. The method of claim 70, comprising substituting.

substituting one subset of said plurality of hydrophobic amino acids in one and the same TM region of a GPCR to create a member of a variant candidate library, and one or more different subsets of said plurality of hydrophobic amino acids; 72. The method of claim 71, wherein additional members of the library are created by substituting .

10. The method of claim 1, further comprising the step of ranking all members of the library based on a combined score, the combined score being a weighted combination of the α-helix secondary structure prediction result and the transmembrane region prediction result. 70.

71. The method of claim 70, further comprising ranking the variants using a ranking function.

71. The method of claim 70, further comprising a memory coupled to the data processor.

75. The method of claim 74, wherein the ranking function includes secondary structure components and water soluble components.

77. The method of claim 76, wherein the ranking function includes weighting values for the secondary structure components and/or the water-soluble components.

further comprising selecting N members with the highest combination scores to form a first library of variant candidates for the TM region, where N is a predetermined integer (e.g., 3, 74. The method of claim 73, wherein the method is 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more).

79. The method of claim 78, further comprising generating a library of candidate variants for one, two, three, four, five or all six other TM regions of the GPCR.

80. The method of claim 79, further comprising replacing two or more TM regions of the GPCR with corresponding TM regions in the variant candidate library to create a combinatorial variant library.