JPH11501520A

JPH11501520A - Nucleotide sequence of Haemophilus influenzae Rd genome, fragments thereof and uses thereof

Info

Publication number: JPH11501520A
Application number: JP8531888A
Authority: JP
Inventors: フレイシュマン，ロバート・ディ; アダムス，マーク・ディ; ホワイト，オーウェン; スミス，ハミルトン・オー; ベンター，ジェイ・クレイグ
Original assignee: Human Genome Sciences Inc; Johns Hopkins University
Current assignee: Human Genome Sciences Inc; Johns Hopkins University
Priority date: 1995-04-21
Filing date: 1996-04-22
Publication date: 1999-02-09
Also published as: AU5552396A; WO1996033276A1; CA2218741A1; EP0821737A4; EP0821737A1

Abstract

(57)【要約】本発明はインフルエンザ菌Ｒdの全ゲノム配列、配列識別番号:１の配列決定を提供する。本発明は更に、コンピューター読み出し媒体に保存されている配列情報、及びコンピューターに基づくシステム、及びその使用を容易にする方法を提供する。全ゲノム配列に加えて、本発明はゲノムの1700を超えるタンパク質コード化フラグメントを同定し、そして特有のＮotI制限エンドヌクレアーゼ部位に関連した位置によって、ヘモフィルス属ゲノムのフラグメントをコードしているタンパク質の発現を調節する任意の調筋要素を同定する。 (57) [Summary] The present invention provides the sequencing of the entire genome sequence of Haemophilus influenzae Rd, SEQ ID NO: 1. The invention further provides sequence information stored on a computer readable medium, and a computer-based system, and a method that facilitates its use. In addition to whole genome sequences, the present invention has identified over 1700 protein-encoding fragments of the genome and, by virtue of their location relative to the unique NotI restriction endonuclease site, Identify any modulatory elements that regulate expression.

Description

【発明の詳細な説明】インフルエンザ菌Ｒdゲノムのヌクレオチド配列、そのフラグメント及びその使用本発明の展開中に実施された研究の１部は米国政府の資金を使用した。政府は本発明に一定の権利を有することができる。ＮＩＨ-5Ｒ01ＧＭ48251 発明の分野本発明は分子生物学の分野に関するものである。本発明はインフルエンザ菌（ヘモフィルスインフルエンザ）のヌクレオチド配列、そのフラグメントを含む組成物並びに産業的発酵及び製薬的開発における使用法を開示する。発明の背景自由に生存する細胞生物の完全なゲノム配列は未だ決定されていない。最初のミコバクテリウム配列は1996年までに完了し、一方大腸菌及びS.セレビシエは19 98年前までに完了すると期待されている。これらはオーバラップコスミドクローンの無作為及び／又は指令配列決定によってなされている。ランダムショットガン方法によって１メガベース又はそれ以上のオーダーの配列を決定しようと試みた者はいなかった。インフルエンザ菌は、唯一の天然宿主がヒトである小さい（概ね0.4×１ミクロン）非自動性、非胞子形成、胚芽陰性の細菌である。これは子供や成人の上部呼吸器粘膜に住みついており、そして殆どが子供で中耳炎や気道感染を生じさせる。最も重篤な合併症は髄膜炎であり、そしてこれは感染児の50％までで神経病学的続発症をもたらす。免疫学的に異なる莢膜多糖抗原に基づいて、６種のインフルエンザ菌血清型（ａからｆ）が同定されている。型を決定できない多数の株も知られている。血清型ｂは大多数のヒト疾病の原因となる。インフルエンザ菌生物学の医学的に重要な特徴における関心は、この生物のビルレンス特徴を決定する遺伝子に特に焦点が集まっている。莢膜多糖に寄与する多数の遺伝子のマップが作成されそして配列が決定された（Ｋroll等、Ｍol ．Ｍicrobiol.5(6); 1549〜1560(1991年)。幾つかの外層膜タンパク質（ＯＭＰ）遺伝子が同定されそして配列が決定された(Ｌangford等、Ｊ．Ｇen．Ｍicrobi l．138:155〜159（1992年）)。外層膜のリポオリゴ糠（ＬＯＳ）成分及びその合成経路の遺伝子は集中的に研究されている(Ｗeiser等、Ｊ．Ｂacteriol．172:33 04〜3309（1990年）)。1984年以降ワクチンは利用可能であるが、外層膜成分の研究は、改良ワクチンに対する需要によって或る程度動機づけられている。最近、カタラーゼ遺伝子はビルレンス関連遺伝子の可能性があるとして特徴付けられそして配列が決定された(Ｂishni等、印刷中)。インフルエンザ菌ゲノムの解明はインフルエンザ菌がどのようにして侵襲性疾患を引き起こすのかそしてどのようにして感染と最も良く闘うのかの理解を高めるであろう。インフルエンザ菌は非常に効率的な天然ＤＮＡ形質転換系を有しており、そしてこの系は非封入(Ｒ)、血清型ｄ株で集中的に研究されている(Ｋahn及びＳmith 、Ｊ．Ｍembrans Ｂiology 81:89〜103（1984年）).少なくとも16の形質転換特異的遺伝子が同定されそして配列が決定されている。これらのうち、４つは調節遺伝子であり(Ｒedfield．Ｊ．Ｂacteriol.、173:5612〜5618（1991年）及びＣh andler、Ｐroc．Ｎatl．Ａcad．Ｓci．ＵＳＡ 89:1626〜1630（1992年）)、少なくとも２つは組換え過程に関係しており(Ｂarouki及びＳmith．Ｊ．Ｂaoteriol. 、163(2):629〜634（1985年）)、そして少なくとも７つは膜及び細胞周辺腔に向けられており(Ｔomb等、Ｇene 104:1〜10(1991年)及びＴomb．Ｐroc．Ｎatl．Ａ cad．Ｓci．ＵＳＡ 89:10252〜10256（1992年）)、そしてそこでこれら遺伝子は構造成分として又はＤＮＡ輸送機構のアッセンブリー内で機能するように思われる。インフルエンザ菌Ｒdの形質転換は、前列特異的ＤＮＡ取り込み、コンピテント細胞当たり数個の二本鎖ＤＮＡ分子のトランスフォーマソムと呼ばれる膜小室中への迅速な取込み、ドナーＤＮＡの１本鎖の細胞質中への直線的トランスローケーション、並びに１本鎖置換メカニズムによる１本鎖と染色体の対合及び組換えを含む多数の興味ある特徴を示している。インフルエンザ菌Ｒd形質転換系はグラム陰性系の中で最も完全に研究された系であり、そしてグラム陽性系とは多数の態様で異なっている。インフルエンザ菌Ｒdゲノムの大きさは制限滴化物のパルスフィールドアガロースゲル電気泳動法によって概ね1.9Ｍbであると決定されており、このゲノムは大膓菌の大きさの概ね40％である(Ｌee及びＳmith、Ｊ．Ｂacteriol．170:4402 〜4405（1988年）)。インフルエンザ菌の制限マップは円形である(Ｌee等、Ｊ．Ｂacteriol．171:3016〜3024（1989年）並びにＲedfield及びＬee、「インフルエンザ菌Ｒd」、2110〜2112頁、Ｏ’Ｂrien，Ｓ.Ｊ.(編集)、Ｇanetic Ｍaps: Ｌoc us Ｍaps of Ｃomples Ｇenomex、Ｃold Ｓpring Ｈarbor Ｐress、ニューヨーク)。種々の遺伝子が制限洞化ＤＮＡバンドのサザンハイブリッド形成プローブ法によって制限フラグメントにマッピングされている。このマップは無作為に配列決定されたフラグメントから得られた完全なゲノム配列のアッセンブリーを証明するのに価値があろう。ジーンバンク（ＧenＢank）には現在約100kbの冗長のないインフルエンザ菌ＤＮＡ配列がある。約半分は血清型ｂからのものでありそして半分はＲdからのものである発明の要約本発明はインフルエンザ菌Ｒdゲノムの配列決定に基づいている。得られた一次ヌクレオチド配列は配列識別番号:１で提供する。本発明はインフルエンザ菌Ｒdゲノムの得られたヌクレオチド配列又はその代表的なフラグメントを熟練技術者が容易に使用し、分析しそして解釈できる形態で提供する。１つの実施態様では、本発明は配列識別番号:１で示されるヌクレオチド配列に相当する一次配列情報の連続列として提供される。本発明は更に、配列識別番号:１のヌクレオチド配列と少なくとも99.9％同一のヌクレオチド配列を提供する。配列識別番号:１のヌクレオチド配列、その代表的なフラグメント又は配列識別番号:１のヌクレオチド配列と少なくとも99.9％同一のヌクレオチド配列はその使用を容易にする多様な媒体中で提供することができる。この実施態様の１つの適用では、本発明の配列はコンピューター読み出し媒体に記載される。このような媒体には、磁気保存媒体、例えばフロッピーディスク、ハードディスク保存媒体及び磁気チープ; ＣＤ-ＲＯＭのような光学保存媒体; ＲＡＭ及びＲＯＭのような電気保存媒体; 並びにこれらカテゴリーの合成物、例えば磁気／光学保存媒体が含まれるが、これらに限定されない。本発明は更に、システム、特に、データ保存手段に保存され本明細書に記載された配列情報を有しているコンピューターに基づくシステムを提供する。このようなシステムはインフルエンザ菌Ｒdゲノムの商業的に重要なフラグメントを同定するように設計されている。本発明のもう１つの実施態様はインフルエンザ菌Ｒdゲノムの単離フラグメントに向けれらている。本発明のインフルエンザ菌Ｒdゲノムのフラグメントには、ペプチドをコードしているフラグメント(以下、読み取り枠(ＯＲＦ))、操作可能的に結合されたＯＲＦの発現を調節するフラグメント(以下、発現調節フラグメント（ＥＭＦ）)、結合ＤＮＡフラグメントの細胞内取込みに介在するフラグメント(以下、取込み調節フラグメント（ＵＭＦ）)、及び試料中のインフルエンザ菌Ｒdの存在を診断するために使用できるフラグメント（以下、診断フラグメント（ＤＦ））が含まれるが、これらに限定されない。インフルエンザ菌Ｒdゲノムの各ＯＲＦフラグメントは表１(ａ)及び２に開示し、そしてＯＲＦの５'に見られるＥＭＦはポリヌクレオチド試薬として膨大な数の方法で使用することができる。これらの配列は、試料中の特定の微生物の存在についての診断プローブ又は診断増幅プライマーとして、商業的に重要な医療品を製造するために、そして遺伝子発現を選択的に制御するために使用することができる。本発明には更に、本発明のインフルエンザ菌Ｒdゲノムの１つ又はそれより多いフラグメントを含んでいる組換え構築物が含まれる。本発明の組換え構築物は、インフルエンザ菌Ｒdのフラグメントが挿入されているベクター、例えばプラスミド又はウイルスベクターを含んでいる。本発明は更に、本発明のインフルエンザ菌Ｒdゲノムの単離フラグメントの任意の１つを含有する宿主細胞を提供する。これらの宿主細胞は哺乳動物細胞のような高等真核生物宿主、酵母細胞のような下等真核細胞であることができるか又は細菌細胞のような原核細胞であることができる。本発明は更に、本発明のＯＲＦでコードされている単離タンパク質に向けられている。当該技術分野で知られている多様な方油を使用して本発明のタンバク質の任意の１つを取得することができる。最も簡単なレベルでは、商業的に入手できるペプチド合成器を使用してアミノ酸配列を合成することができる。代替的方法では、タンパク質は天然にタンパク質を産生する細菌細胞から精製される。最後に、本発明のタンパク質は所望のタンパク質を発現するように改変されている細胞から代替的に精製することができる。本発明は更に、本発明のインフルエンザ菌Ｒdゲノ４のフラグメントの相同体及び本発明のＯＲＦによってコードされるタンパク質の相同体を取得する方法を提供する。詳細には、本明細書に開示されたヌクレオチド及びアミノ酸配列をプローブ又はプライマーとして使用しそしてＰＣＲクローニング法やコロニー／プラークハイブリッド形成法のような技術を使用することによって、当該技術分野の熟練者は相同体を取得することができる。本発明は更に、本発明のタンパク質の１つと選択的に結合する抗体を提供する。このような抗体にはモノクローナル抗体とポリクローナル抗体の両方が含まれる。本発明は更に、上記抗体を産生するハイブリドーマを提供する。ハイブリドーマは特異的なモノクローナル抗体を分秘し得る不死化細胞株である。本発明は更に、本発明のＯＲＦの１つ又はその相同体を発現する細胞から誘導された試験試料の同定方法を提供する。このような方法は、試験試料を本発明の１つ若しくはそれより多い抗体又は本発明の１つ若しくはそれより多いＤＦと共に、試料がＯＲＦ又はこれから産生された生成物を含有しているかどうかを熟練者が測定できる条件下でインキュベートすることを含んでいる。本発明のもう１つの実施態様では、上記アッセイを実施するために必要な試薬を含有しているキットを提供する。詳細には、本発明は、（ａ）本発明の抗体の１つ又はＤＦの１つを含んでいる第１の容器; 及び（ｂ）１つ又はそれより多い次の試薬: 洗浄試薬、結合した抗体又はハイブリッド化したＤＦの存在を検出し得る試薬を含んでいる１つ又はそれより多い他の容器、を含んでいる１つ又はそれより多い容器を、近接して閉じ込めて受け入れる隔室キットを提供する。本発明の単離タンパク質を使用して、本発明は更に、本発明のＯＲＦの１つによってコード化されるタンパク質と結合し得る物質を取得しそして同定する方法を提供する。詳細には、このような物質には抗体(上記した)、ペプチド、炭水化物、医薬品等が含まれる。このような方法は次の段階を含んでいる：（ａ）物質を、本発明のＯＲＦの１つによってコードされる単離タンパク質と接触させること; 及び（ｂ）上記物質が上記タンパク質と結合するかどうかを測定すること。インフルエンザ菌の完全なゲノム配列は、この生物で研究している全ての研究室にとってそして多様な商業的目的で非常に価値があろう。インフルエンザ菌Ｒ dゲノムの多数のフラグメントはジーンバンク又はタンパク質データベースの類似性検索によって直ちに同定され、そしてヘモフィルス属研究者にとって当面の価値があり、そしてタンパク質の製造又は遺伝子発現の制御にとって当面の商業的な価値があるであろう。特別の例はＰＨＡシンターゼに関するものである。ポリヒドロキシブチレートはインフルエンザ菌Ｒdの膜に存在しておりそしてその量は形質転換に必要な能力レベルと相互関係があることが報告されている。このポリマーを合成するＰＨＡシンターゼは多数の細菌で同定されそして配列が決定されているが、それら細菌はどれもインフルエンザ菌と進化的に近くない。この遺伝子は、ハイブリッド形成ブローブ又はＰＣＲ技術を使用してインフルエンザ菌から未だ単離しなければならない。しかし乍ら、本発明のゲノム配列は以下に記載する検索手段を使用することによってこの遺伝子の同定を可能にする。細菌や他の小ゲノムの全ゲノム配列を解明する方法論や技術の発展によって染色体組織を分析する能力や理解が大いに高められておりそして高められるであろう。特に、配列が決定されたゲノムは、ゲノムＤＮＡの大きい断片内の遺伝子を同定し得ること、調節要素の構造、位置及び間隔、産業的適用の可能性を有する遺伝子を同定すること、並びに比較ゲノム及び分子系統発生学を行い得ることを含めて、染色体の構造及び機能を分析する手段を開発するモデルを提供するであろう。図面の説明図１ − インフルエンザ菌Ｒdゲノムの制限マップ。図２ − 本発明のコンピューターに基づくシスチムを実行するために使用できるコンピューターシステム102の組立て分解図。図３ − 460bpの平均配列長及び25bpの重複を有する2.5Ｍbゲノム（三角）及び1.6Ｍbゲノム（丸）についてのランダー・ウォーターマン（Ｌander-Ｗaterma n）の予測と比較した、オートアッセンブラー（ＡutoＡssembler）（四角）で組み立てた概ね4000個までのランダム配列フラグメントの実験的範囲の比較。図４ − インフルエンザ菌ゲノムを管理し、組み立て、編集しそして注釈をつけるために使用したデータフロー及びコンピュータープログラム。マッキントッシュ（Ｍacintosh）とユニックス（Ｕnix）の両プラットフォームを使用してＡＢ373配列データファイル（Ｋerlavage等、Ｐroceedings of the Ｔwenty-Ｓixt h Ａnnual Ｈawaii Ｉnternational Ｃonference on Ｓystem Ｓciences、ＩＥＥＥＣomputer Ｓocity Ｐrees、ワシントンＤ.Ｃ.、585（1993年））を処理する。ファクチュラ（Ｆactura）（ＡＢ）は配列ファイルの自動ベクター配列除去及び末端切り取り用に設計したマッキントッシュのプログラムである。プログラムespはマッキントッシュプラットフォームで作動しそしてファクチュラによって配列ファイルから抽出された特徴データをユニックスに基づくインフルエンザ菌関連データベースに書き込む。組み立ては、stp、Ｘ-ウィンドウグラフィックインターフェース、及び使用者によって特定されるか又は標準的なＳＱＬ照会を使用してインフルエンザ菌データベースから配列を取り出すことができる制御プログラムを使用して特定の配列ファイルセット及びそれらの関連特徴を取り出すことによって達成される。配列ファイルは、数千もの配列フラグメントを迅速且つ正確に組み立てるためにＴＩＧＲで設計された組み立て装置、ＴＩＧＲアッセンブラーを使用して組み立てられた。ＴＩＧＲエディター（Ｅditor）はＴＩＧＲアッセンブラーアウトプットから得られる整列した配列ファイルを書き込みそして連続編集用にこの整列及び関連エレクトロフェログラムを表示することができるグラフィックインターフェースである。推定上のコード化領域の同定はジーンマーク（Ｇenemark）( Ｂorodovsky及びＭcIninch、Ｃomputers Ｃhem。17(2):123（1993年）)、即ち遺伝子の位置を予測するマーコフ（Ｍarkov）及びベイズ（Bayes）のモデル化プログラムで実施し、そしてインフルエンザ菌配列データセットに向けた。ペプチド検索は、4096個のマイクロプロセッサーを有するマスパル（Ｍaspar）ＭＰ-2大規模パラレルコンピューターで実行されるblaze（Ｂrutag等、Ｃomputers Ｃhem ．17:203（1993年））を使用してジーンマークで予測される各コード化領域の３つの読み取り枠に対して実施した。各読み取り枠から得られる結果をmblztによって単一のアウトプットファイル中に組み合わせた。最適のタンパク質整列は潜在的なフレームシフト間に整列を延長させるプログラムprazeを使用して取得した。アウトプットは、インフルエンザ菌データベースと直接相互作用するオーダーメードのグラフィックビュープログラム、gbyobを使用して調べた。これらの整列は潜在的なフレームシフトエラーを同定するために使用しそして更に編集するための目標とした。図５ − データベースとの適合を有する各予測コード化領域の位置並びにゲノムの選択された全体的特徴を示しているインフルエンザ菌Ｒd染色体の円形表示。外部円周: 特有のＮotＩ制限部位(ヌクレオチド１と呼称される)、ＲarII部位及びＳmaＩ部位の位置。外部同心円: 遺伝子同定を行った各同定コード化領域の位置。各コード化領域の位置は図６の色コードによる役割に関してコード化されている。第２の同心円: 高いＧ／Ｃ含有量の領域（＞42％、赤色; ＞40％、青色）及び高いＡ／Ｔ含有量の領域(＞66％、黒色; ＞64％、緑色)。高いＧ／Ｃ含有量の領域は６個のリボソームオペロンとミュー様プロファージと特に関係がある。第３の同心円: ラムダクローンによる範囲(青色)。 300個を超えるラムダクローンは各末端から配列決定して、ゲノムの構造全体を確認しそして６個のリボソームオペロンを同定した。第４の同心円: ６つのリボソームオペロン(緑色)、ｔＲＮＡ（黒色）及び曖昧なミュー様プロファージ（青色）の位置。第５の同心円: 単純な一列に並んだ繰返し。次の繰返しの位置が示されている: ＣＴＧＧＣＴ、ＧＴＣＴ、ＡＴＴ、ＡＡＴＧＧＣ、ＴＩＧＡ、ＴＴＧＧ、ＴＴＴＡ、ＴＴＡＴＣ、ＴＧＡＣ、ＴＣＧＴＣ、ＡＡＣＣ、ＴＴＧＧ、ＣＡＡＴ、ＣＣＡＡ。推定上の複製起源は塩基603,000付近で始まる外側で方向を指している矢印(緑色６)によって示される。２つの潜在的終結配列は円の反対側の中間点付近で示されている(赤色)。図６（Ａ）〜６（Ｄ） − インフルエンザ菌Ｒdゲノムの完全なマップ。予測コード化領域を各鎖上に示す。ｒＲＮＡ及びｔＲＮＡ遺伝子はそれぞれ直線及び三角として示す。遺伝子は凡例に記載された役割カテゴリーによって色でコード化されている。遺伝子識別（ＧeneＩＤ）番号は表１(ａ)、１(ｂ)及び２の番号に対応する。可能な場合、３文字表記も扱供する。図７ − インフルエンザ菌ｂ型中に存在するふさ状へり遺伝子集団の８個の遺伝子を含有するインフルエンザ菌染色体の領域とインフルエンザ菌Ｒd中の同じ領域との比較。この領域では両生物共に pepＮ及びpurＥ遺伝子が側部に結合している。しかし乍ら、非感染性Ｒd株では、ふさ状へり遺伝子集団の８個の遺伝子が切り取られている。Ｒd株のこの領域には172bpのスペーサー領域が位置しており、そしてpepＮ及びpurＥ遺伝子が側部に結合している。図８ − ５つの予測チャンネルタンパク質の疎水性分析。既知のペプチド配列（ＧenＢankのリリース87種）との相同性を示していない５つの予測コード化領域のアミノ酸配列であって、各アミノ酸配列はチャンネル形成タンパク質の特徴である複数の疎水性ドメインを示している。予測コード化領域の配列は、ジーンワークス（ＧeneＷorks）ソフトウェアパッケージ（Ｉntelligenetics）を使用してカイト・ドーリトル（Ｋyte-Ｄoolittle）アルゴリズム（Ｋyte及びＤoolit tle、Ｊ．Ｍol．Ｂiol.157:105（1982年））（11残基の範囲で）で分析した。好ましい実施態様の詳細な説明本発明はインフルエンザ菌Ｒdゲノムの配列決定に基づいている。得られた一次ヌクレオチド配列は配列識別番号:１で提供されている。本明細書で使用するとき、「一次前列」とはＩＵＰＡＣ命名法で表わされるヌクレオチド記列を言う。配列識別番号:１で提供される配列はインフルエンザ菌Ｒdゲノム中に見られる特有のＮotＩ制限エンドヌクレアーゼ部位に関連して適応されている。熟練技術者は、この開始／停止点が便宣上選択されたものでありそして構造的意義を反映していないことを容易に認識するであろう。本発明は配列識別番号:１のヌクレオチド配列又はその代表的なフラグメントを、熟練技術者が容易に使用し、分析しそして解釈できる形態で提供する。１つの実施態様では、配列は配列識別番号:１で提供されるヌクレオチド配列に対応する一次配列情報の連続列として提供される。本明細書で使用するとき、「配列識別番号:１で示されるヌクレオチド配列の代表的なフラグメント」とは、公に利用可能なデータベース内で現在示されていない配列識別番号:１の任意の部分を言う。本発明の好ましい代表的なフラグメントはインフルエンザ菌読み取り枠、発現調節フラグメント、取込み調節フラグメント及び試料中のインフルエンサ菌Ｒdの存在を診断するために使用できるフラグメントである。このような好ましい代表的なフラグメントの識別は表１（ａ）及び２に提供されるがこれらに限定されない。配列識別番号:１で提供されたヌクレオチド配列情報は、メガベースショットガン配列決定方法を使用してインフルエンザ菌Ｒdゲノムの配列を決定することによって得られた。下記実施例で考察した正確さに関する３つのパラメーターを使用して、本発明者は配列識別番号:１の配列が最大99.98％の正確さを有していると計算した。かくして、配列識別番号:１で提供されたヌクレオチド前列は、必ずしも100％完全ではないが、インフルエンザ菌Ｒdゲノムのヌクレオチド配列を非常に正確に示すものである。以下で詳細に検討するように、配列識別番号:１並びに表１（ａ）及び２で提供された情報を定型的なクローニング及び配列決定方法と一緒に使用して、当該技術分野の通常の技倆を有する者は、非常に多様なインフルエンザ菌タンパク質をコードする読み取り枠（ＯＲＦ）を含む重要な全ての「代表的なフラグメント」をクローン化しそしてそれらの配列を決定することができるであろう。極く稀な場合に、これは配列識別番号:１で開示されたヌクレオチド配列中に存在するヌクレオチド配列の誤謬を明らかにすることがある。それ故、本発明が一度利用可能になると(即ち、配列識別番号:１並びに表１(ａ)及び２の情報が一度利用可能になると)、配列識別番号:１中の配列決定の稀な誤謬の解決は正に当該技術分野の技倆の範囲内であろう。ヌクレオチド配列編集ソフトウエアは公に利用可能である。例えば、アプライドバイオシステム（Ａpplied Ｂiosysytem）（ＡＢ）のオートアッセンブラー（商標）はヌクレオチド配列を視覚的に調べている間の補助物として使用することができる。たとえ配列識別番号:１の非常に稀な配列決定の誤謬が全て訂正されたとしても、得られるヌクレオチドは配列識別番号:１のヌクレオチド配列と依然として少なくとも99.9％同一であろう。異なるインフルエンザ菌株から得られるゲノムのヌクレオチド配列は僅かに異なっている。しかし乍ら、全てのインフルエンザ菌株のゲノムのヌクレオチド配列は配列識別番号:１で提供されたヌクレオチド配列と少なくとも99.9％同一であろう。かくして、本発明は更に、配列識別番号:１のヌクレオチド配列と少なくとも9 9.9％同一のヌクレオチド配列を熟練技術者が容易に使用し、分析しそして解釈できる形態で提供する。ヌクレオチド配列が配列識別番号:１のヌクレオチド配列と少なくとも99.9％同一であるかどうかを決定する方法は定型的でありそして熟練技術者にとって容易に利用可能である。例えば、周知のfastaアルゴリズム（Ｐearson及びＬipman、Ｐroc．Ｎatl．Ａcad．Ｓoi．ＵＳＡ 85:2444（1988年）を使用してヌクレオチド配列の同一性パーセントを得ることができる。コンピューター関連実施態様配列識別番号:１で提供されたヌクレオチド配列、その代表的なフラグメント又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列はそれらの使用を容易にする多様な媒体中で「提供され」得るであろう。本明細書で使用するとき、提供されたとは、本発明のヌクレオチド配列、即ち、配列識別番号:１で提供されたヌクレオチド配列、その代表的なフラグメント又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列を含有する単離核酸分子以外の製造物を言う。このような製造物はインフルエンザ菌Ｒdゲノム又はそのサブセット（例えば、インフルエンザ菌Ｒd読み取り枠（ＯＲＦ））を、インフルエンザ菌Ｒdゲノム又は天然若しくは精製形態で存在している該ゲノムのサブセットを試験するために直接適用できない手段を使用して、熟練技術者がこの製造物を試験できるようになる形態で提供する。この実施態様の１つの適用では、本発明のヌクレオチド配列をコンピューター読み出し媒体に記録することができる。本明細書で使用するとき、「コンピューター読み出し媒体」とはコンピューターが読み出しそして直接アクセスできる任意の媒体を言う。このような媒体には、フロッピーディスク、ハードディスク保存媒体及び磁気テープのような磁気保存媒体; ＣＤ-ＲＯＭのような光学保存媒体: ＲＡＭ及びＲＯＭのような電気保存媒体; 並びにこれらカテゴリーの合成物、例えば磁気／光学保存媒体が含まれるが、これらに限定されない。熟練技術者は、現在知られているコンピューター読み出し媒体をどのように使用して、本発明のヌクレオチド配列が記録されているコンピューター読み出し媒体を含んでいる製造物を創製するのかを容易に理解することができるであろう。本明細書で使用するとき、「記録された」とはコンピューター読み出し媒体に情報を保存する過程を言う。熟練技術者は、現在知られているコンピューター読み出し媒体への情報記録方法を容易に採用して、本発明のヌクレオチド配列情報を含む製造物を作ることができるであろう。本発明のヌクレオチド配列を記録しているコンピューター読み出し媒体を創作するために、熟練技術者は多様なデータ保存構造物を利用することができる。データ保存構造物の選択は一般的には、保存情報にアクセスするために選択される手段に基づいている。加えて、多様なデータプロセッサープログラム及びフォーマットを使用して本発明のヌクレオチド配列情報をコンピューター読み出し媒体に保存することができる。配列情報は、ワープロテキストファイル中に示すか、ワードパーフェクト（ＷordＰerfect）やマイクロソフトワード（Ｍicr oＳoft Ｗord）のような市販で入手可能なソフトウエア中にフォーマットするか、又はＤＢ2、Ｓybase、Ｏracle等のようなデータベースアクリケーション中に保存されたＡＳＣIIファイルの形態で示すことができる。熟練技術者は、本発明のヌクレオチド配列情報が記録されているコンピューター読み出し媒体を得るために、任意の数のデータプロセッサー構築フォーマット（例えば、テキストファイル又はデータベース）を容易に適合させることができる。配列識別番号:１のヌクレオチド配列、その代表的なフラグメント、又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列をコンピューター読み出し形態で提供することによって、熟練技術者は多様な目的で配列情報に定型的にアクセスすることができる。熟練技術者がコンピューター読み出し媒体中に提供された配列情報にアクセスすることができるコンピューターソフトウエアは公に入手可能である。以下の実施例は、ＳybaseシステムでＢＬＡＳＴ（Ａltschul 等、Ｊ．Ｍol．Ｂiol．215:403〜410（1990年））及びＢＬＡＺＥ（Ｂrutlag等、Ｃomp．Ｃhem．17;203〜207（1993年））検索アルゴリズムを実行するソフトウエアをどのように使用して、インフルエンザ菌Ｒdゲノム内の読み取り枠（ＯＲＦ）を、他の生物から得られるその相同体又はタンパク質を含めて同定するのかを示している。このようなＯＲＦはインフルエンザ菌Ｒdゲノム内のフラグメントをコードしているタンパク質であり、そして発酵反応及び商業的に有用な代謝物の製造に使用される酵素のような商業的に重要なタンパク質を製造するのに有用である。本発明は更に、本明細書に記載された配列情報を含有しているシステム、特にコンピューターに基づくシステムを提供する。このようなシステムはインフルエンザ菌Ｒdゲノムの商業的に重要なフラグメントを同定するように設計されている。本明細書で使用するとき、「コンピューターに基づくシステム」は、本発明のヌクレオチド配列情報を分析するために使用されるハードウエア手段、ソフトウエア手段及びデータ保存手段を言う。本発明のコンピューターに基づくシステムの最小限のハードウエア手段は中心処環ユニット(ＣＰＵ)、インプット手段、アウトプット手段及びデータ保存手段を含んでいる。熟連技術者は、現在利用可能なコンピューターに基づくシステムのいずれも本発明で使用するのに適していることを容易に理解することができる。上記したように、本発明のコンピューターに基づくシステムは本発明のヌクレオチド配列を保存しているデータ保存手段並びに必要なハードウエア手段及び検索手段を支持しそして実施するソフトウエア手段を含んでいる。本明細書で使用するとき、「データ保存手段」とは、本発明のヌクレオチド配列情報を保存できるメモリー、又は本発明のヌクレオチド配列情報を記録している製造物にアクセスできるメモリーアクセス手段を言う。本明細書で使用するとき、「検索手段」とは、標的配列又は標的構造モチーフをデータ保存手段内に保存された配列情報と比較するために、コンピューターに基づくシステムで実行される１つ又はそれより多いプログラムを言う。検索手段は特定の標的配列又は標的モチーフに適合するインフルエンザ菌Ｒdゲノムのフラグメント又は領域を同定するために使用される。多様な既知のアルゴリズムが公に開示されており、そして検索手段実行用の市販で入手可能な多様なソフトウエアが本発明のコンピューターに基づくシステムで使用されそして使用することができる。このようなソフトウエアの例はマックパターン（ＭacＰattern）(ＥＭＢＬ)、ＢＬＡＳＴＮ及びＢＬＡＳＴＸ（ＮＣＢＩＡ）であるが、これらに限定されない。熟練技術者は、相同性検索を実施するための利用可能なアルゴリズム又は実行用ソフトウエアパッケージのどれも本発明のコンピューターに基づくシステムで使用するために適応させ得ることを容易に認識することができる。本明細書で使用するとき、「標的配列」は６個若しくはそれより多いヌクレオチド又は２個若しくはそれより多いアミノ酸の任意のＤＮＡ又はアミノ酸配列であることができる。熟練技術者は、標的配列が長ければ長いほど、標的配列がデータベース中にランダム発生として存在する可能性は少なくなることを容易に認識することができる。標的配列の最も好ましい配列の長さは約10から10 0個のアミノ酸又は約30から300個のヌクレオチド残基である。しかし乍ら、インフルエンザ菌Ｒdゲノムの商業的に重要なフラグメント、例えば遺伝子発現やタンパク質プロセシングに関与する配列フラグメントの検索はより短い長さのフラグメントであることが良く認識されている。本明細書で使用するとき、「標的構造モチーフ」又は「標的モチーフ」とは、配列（単数又は複数）が標的モチーフの折りたたみによって形成される三次元配置に基づいて選択される合理的に選択された任意の配列又は配列の組合せを言う。多様な標的モチーフが当該技術分野で知られている。タンパク質標的モチーフには酵素活性部位及びシグナル配列が含まれるが、これらに限定されない。核酸標的モチーフにはプロモーター配列、ヘアピン構造体及び誘導可能な発現要素（タンパク質結合配列）が含まれるが、これらに限定されない。インプット及びアウトプット手段用の多様な構造フォーマットを使用して、本発明のコンピューターに基づくシステムで情報をインプット又はアウトプットすることができる。アウトプット手段用の好ましいフォーマットは標的配列又は標的モチーフと種々の相同度を有するインフルエンザ菌Ｒdゲノムのフラグメントをランクづける。このようにして示されたものは熟練技術者に種々の量の標的配列又は標的モチーフを含有する配列のランクづけを提供し、そして同定されたフラグメント中に含まれる相同度を同定する。多様な比較手段を使用して標的配列又は標的モチーフをデータ保存手段と比較して、インフルエンザ菌Ｒdゲノムの配列フラグメントを同定することができる。本発明の実施例では、ＢＬＡＳＴ及びＢＬＡＺＥアルゴリズム（Ａltschul等、Ｊ．Ｍol．Ｂiol.215:403〜410（1990年））を実行する実行用ソフトウエアを使用してインフルエンザ菌Ｒdゲノム内の読み取り枠を同定した。熟練技術者は、公に入手可能な相同性検索プログラムがどれも本発明のコンピューターに基づくシステム用の検索手段として使用できることを容易に認識することができる。この実施態様の１つの適用は図２で提供する。図２は、本発明を実行するために使用できるコンピューターシステム102の分解組立て図を提供する。コンピューターシステム102は母線104に連結されたプロセッサー108を含んでいる。母線には主メモリー108（好ましくはランダムアクセスメモリー、ＲＡＭとして実行される）及び多様な二次的保存デバイス110、例えばハードドライブ112及び取外し可能媒体保管デバイス114も連結されている。取外し可能媒体保管デバイス114 は、例えばフロッピーディスクドライブ、ＣＤ-ＲＯＭドライブ、磁気テープドライブ等を表わすことができる。コントロール論理学及び／又はデータが記録されている取外し可能保存媒体116（例えば、フロッピーディスク、コンパクトディスク、磁気テープ等）は取外し可能媒体保管デバイス114中に挿入することができる。コンピューターシステム102は、取外し可能媒体保管デバイス114で一度挿入された取外し可能媒体保管デバイス114からコントロール論理学及び／又はデータを読み出すのに適するソフトウエアを含んでいる。本発明のヌクレオチド配列は、主メモリー108、任意の二次的保存デバイス110 及び／又は取外し可能保存媒体116に周知の方法で保存することができる。ゲノム配列にアクセスしそしてこれを処理するソフトウエア（例えば、検索ツール、比較ツール等）は実行中の主メモリーに存在している。生化学的実施態様本発明のもう１つの実施態様はインフルエンザ菌Ｒdゲノムの単離フラグメントに向けられている。本発明のインフルエンザ菌Ｒdゲノムのフラグメントには、ペプチドをコードしているフラグメント(以下、読み取り枠（ＯＲＦ）)、操作可能的に結合されたＯＲＦの発現を調節するフラグメント(以下、発現調節フラグメント（ＥＭＦ）)、結合ＤＮＡフラグメントの細胞内取込みに介在するフラグメント(以下、取込み調節フラグメント（ＵＭＦ）)、及び試料中のインフルエンザ菌Ｒdの存在を診断するために使用できるフラグメント（以下、診断フラグメント（ＤＦ））が含まれるが、これらに限定されない。本明細書で使用するとき、「単離核酸分子」又は「インフルエンザ菌Ｒdゲノムの単離フラグメント」とは．精製手段に付されて、通常組成物に関連する化合物の数が組成物から減少している特異的なヌクレオチド配列を有する核酸分子を言う。多様な精製手段を使用して本発明の単離フラグメントを生成させることができる。これらには電荷、溶解度又は大きさに基づいて溶液の構成成分を分雛する方法が含まれるが、これらに限定されない。１つの実施態様では、インフルエンザ菌Ｒd ＤＮＡを機械的に剪断して15〜20 kbの長さのフラグメントを作ることができる。次に、これらのフラグメントを使用して、以下の実施例で記載したようにしてラムダクローン中に上記フラグメントを挿入することによってインフルエンザ菌Ｒdライブラリーを生じさせることができる。次いで、配列識別番号;１で提供されるヌクレオチド配列情報を使用して、例えば、標１(ａ)で提供されるＯＲＦを側部に付けたプライマーを生成させることができる。次いで、ＰＣＲクローニング法を使用してラムダＤＮＡライプラリーからＯＲＦを単離することができる。ＰＣＲクローニング法は当該技術分野で周知である。かくして、配列識別番号:１、表１(ａ)及び表２の利用可能性が与えられると、本発明の任意のＯＲＦ又は他の核酸フラグメントを単離することは定型的となろう。本発明の単離核酸分子には一本鎖及び二本鎖ＤＮＡ並びに一本鎖ＲＮＡが含まれるが、これらに限定されない。本明細書で使用するとき、「読み取り枠」、ＯＲＦは終結コドンを有していないアミノ酸をコードする一連の３塩基連鎖を意味し、そしてタンパク質に翻訳可能な配列である。表１ａ、１ｂ及び２はインフルエンザ菌Ｒdゲノム中のＯＲＦを識別している。特に、表１ａは、括弧内の生物（表１(ａ)の第４欄参照）から得られるタンパク質配列に適合する相同性に基づいて記載されたタンパク質をコードしているインフルエンザ菌ゲノム内のＯＲＦの位置を示している。表１（ａ）の第１欄は特定のＯＲＦの「遺伝子識別」を提供している。この情報は２つの理由により有用である。第１に、図６(Ａ)〜６(Ｄ)で提供されるインフルエンザ菌Ｒdゲノムの完全なマップはこれらの遺伝子識別番号に従ってＯＲＦに普及している。第２に、表１(ｂ)は遺伝子識別番号を使用して、どのＯＲＦが公のデータベースでこれまでに提供されたかを示している。表１(ａ)の第２及び第３欄は配列識別番号:１で提供されたヌクレオチド配列中のＯＲＦの位置を示している。通常の技倆を有する者は、ＯＲＦがインフルエンザ菌ゲノム内で反対方向を向いていることがあることを認識するであろう。これは２及び３欄に示されている。表１(ａ)の第５欄は、ＯＲＦによってコードされているタンパク質と第４欄の括弧内の生物から得られる対応するタンパク質との同一性パーセントを示している。表１(ａ)の第６欄は、ＯＲＦによってコードされているタンパク質と第４欄の括弧内の生物から得られる対応するタンパク質との類似性パーセントを示している。２つのポリペプチド配列の同一性パーセント及び類似性パーセントの概念は当該技術分野で良く理解されている。例えば、３個のアミノ酸の位置（例えば、１、３及び５位）が異なっている10個のアミノ酸の長さの２つのポリペプチドは 70％の同一性パーセントを有すると言われる。しかし乍ら、この同じ２つのポリペプチドは、例えば、５位のアミノ酸部分が、同一ではないが、「類似」している（即ち、同様な生化学的特徴を有している）場合、80％の類似性パーセントを有すると考えられるであろう。表１(ａ)の第７欄はアミノ酸相同性適合の長さを示している。表２は、別の生物から得られる既知のタンパク質配列との「相同性適合」をもたらさなかったポリペプチド配列をコードしているインフルエンザ菌ＲdゲノムのＯＲＦを提供している。相同性検索に使用したアルゴリズムと規準に関する詳細は以下の実施例で更に提供されている。熟練技術者は、表１(ａ)、１(ｂ)及び２に示された以外のインフルエンザ菌Ｒ dゲノム内のＯＲＦ、例えば、本発明のコンピューターに基づくシステムを使用して確認可能なものに加えて、同定されたＯＲＦと重複しているか又はこれと反対側の鎖によってコードされているＯＲＦを容易に同定することができる。本明細書で使用するとき、「発現調節フラグメント」、ＥＭＦは、操作可能的に結合したＯＲＦ又はＥＭＦの発現を調節する一連のヌクレオチド分子を意味する。本明細書で使用するように、配列の発現がＥＭＦの存在によって変更するとき、配列は「操作可能的に結合した配列の発現を調節する」と言う。ＥＭＦにはプロモーター及びプロモーター調節配列（誘導要素）が含まれるが、これらに限定されない。ＥＭＦの１つのクラスは、特異的な調節因子又は生理学的事象に応答して操作可能的に結合したＯＲＦの発現を誘導するフラグメントである。ヘモフィルス属から得られる既知のＥＭＦの総説はトム（Ｔomb）等（Ｇene 104:１〜1 0（1991年））及びチャンドラーエム．エス．（Ｃhandler，Ｍ.Ｓ.）（Ｐroc．Ｎatl．Ａoad．Ｓci．ＵＳＡ 89:1826〜1830（1992年））によって記載されている。ＥＭＦ前列は、表１(ａ)、１(ｂ)及び２に提供されたＯＲＦへの近接性によってインフルエンザ菌Ｒdゲノム内で同定することができる。表１(ａ)、１(ｂ)又は２のＯＲＦの任意の１つの５'で取った約10から200個までのヌクレオチドの長さの遺伝子間断片又は遺伝子間断片のフラグメントは、天然に結合したＯＲＦ配列で見られるものと類似した態様で、操作可能的に結合した３'ＯＲＦの発現を調節するであろう。本明細書で使用するとき、「遺伝子間断片」とは、本明細書で記載した２つのＯＲＦ間のヘモフィルス属ゲノムのフラグメントを言う。或いは、ＥＭＦは本発明のコンピューターに基づくシステムで標的配列又は標的モチーフとして既知のＥＭＦを使用して同定することができる。ＥＭＦの存在及び活性はＥＭＦトラップベクターを使用して確認することができる。ＥＭＦトラップベクターはマーカー配列に対するクローニング部位５'を有している。マーカー配列は同定可能な表現型、例えば、抗生物質耐性又は補充栄養要求性因子をコードしており、そしてＥＭＦアトラップベクターを適当な条件下で適当な宿主内に入れたとき上記表現型を同定又はアッセイすることができる。上記したように、ＥＭＦは操作可能的に結合したマーカー配列の発現を調節するであろう。種々のマーカー配列の更に詳細な考察は以下で提供する。ＥＭＦであると考えられる配列を、ＥＭＦトラップベクター内のマーカー配列から上流の１つ又はそれより多い制限部位の３つの読取り枠の全てでクローン化する。次いで、既知の方法を使用してこのベクターを適当な宿主内に形質転換し、そして形質転換した宿主の表現型を適当な条件下で試験する。上記したように、ＥＭＦは操作可能的に結合したマーカー配列の発現を調節する。本明細書で使用するとき、「取込み調節フラグメント」、ＵＭＦは、結合ＤＮＡフラグメントの細胞内への取込みに介在する一連のヌクレオチド分子を意味する。ＵＭＦは上記したコンピューターに基づくシステムを用いて標的配列又は標的モチーフとして既知のＵＭＦを使用して容易に同定することができる。ＵＭＦの存在及び活性は、ＵＭＦと考えられるものをマーカー配列に結合させて確認することができる。次に、得られた核酸分子を適当な条件下で適当な宿主とともにインキュベートし、そしてマーカー配列の取込みを測定する。上記したように、ＵＭＦは結合したマーカー配列の取込み頻度を高める。ヘモフィルス属におけるＤＮＡ取込みの総説はグッドガルエス．エイチ．(Ｇoodgall，Ｓ.Ｈ.) 等、Ｊ．Ｂact．172:5924〜5928（1990年）によって提供されている。本明細書で使用するとき、「診断フラグメント」、ＤＦはインフルエンザ菌配列と選択的にハイブリッドを形成する一連のヌクレオチド配列を意味する。ＤＦは、インフルエンザ菌Ｒdゲノム内の特有の配列を同定することによってか又は、増幅若しくはハイブリッド形成選択性を測定する適当な診断フォーマットでＤＦ配列からなるプローブ若しくは増幅プライマーを発生させそして試験することによって容易に同定することができる。本発明の範囲内に入る配列は本明細書に記載された特定の配列に限定されないで、それらの対立遺伝子や種異体も含んでいる。対立遺伝子及び種異体は配列識別番号:１で提供される配列、その代表的なフラグメント又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列を同じ種の別の単離物から得られる配列と比較することによって定型的に測定することができる。更に、コドン変動性を受け入れるために、本発明には本明細書で開示された特定のＯＲＦがコードするものと同一のアミノ酸配列をコードする核酸分子が含まれる。換言すれば、ＯＲＦのコード化領域において、１つのコドンを、同じアミノ酸をコードしているもう１つのコドンで置き換えることが明白に意図されている。本明細書で開示された任意の特定の配列は、ＯＲＦのような特別のフラグメントを両方向（即ち、両鎖の前列）で配列を再度決定することによって誤謬を容易にスクリーニングすることができる。或いは、誤謬のスクリーニングは、プローブ又はプライマーとして問題のフラグメントの１部又は全てを使用して単離したインフルエンザ菌起源の対応するポリヌクレオチドの配列を決定することによって実施することができる。表１(ａ)、１(ｂ)及び２に開示されたインフルエンザ菌Ｒdゲノムの各ＯＲＦ並びにＯＲＦの６'に見られるＥＭＦはポリヌクレオチド試薬として多数の方法で使用することができる。これらの配列は、試料中のインフルエンザ菌ＲＤのような特定の微生物の存在を検出する診断プローブ又は診断増幅プライマーとして使用することができる。これは、インフルエンザ菌に対して非常に選択的な表２のフラグメント又はＯＲＦの場合に特に当てはまる。加えて、広範囲に記載した本発明のフラグメントを使用して、三重らせん形成又はアンチセンスＤＮＡ若しくはＲＮＡによって遺伝子発現を制御することができ、そしてこれら方法は共にポリヌクレオチド配列とＤＮＡ又はＲＮＡの結合に基づいている。これらの方法で使用するのに適するポリヌクレオチドは通常、20 から40塩基の長さでありそして転写に係わる遺伝子領域（三重らせん − Ｌee等、Ｎucl．Ａcids Ｒes．6:3073(1979年); Ｃooney等、Ｓcience 241:456(1988年 ); 及びＤervan等、Ｓcience 251:1360（1991年）参照）又はｍＲＮＡ自体（アンチセンス − Ｏkano、Ｊ．Ｎeurochem．56:560(1991年); Ｏligodeoxynucleot ides as Ａntisense Ｉnhibitors of Ｇene Ｅxpression、ＣＲＣＰress、フロリダ州ボカレートン（1988年））に相補的であるように設計される。三重らせん形成はＤＮＡからＲＮＡの転写を最適に停止させ、一方アンチセンスＲＮＡハイブリッド形成はｍＲＮＡ分子のポリペプチドへの翻訳を遮断する。両技術共モデルシスチムで有効であることが証明されている。本発明の配列に含まれている情報はアンチセンス又は三重らせんオリゴヌクレオチドを設計するために必要である。本発明は更に、本発明のインフルエンザ菌Ｒdゲノムの１つ又はそれより多いフラグメントを含む組換え構築物を提供する。本発明の組換え構築物はベクター、例えば、プラスミド又はウイルスベクターを含んでおり、そしてこの中にインフルエンザ菌Ｒdが、前方向又は逆方向に挿入されている。本発明のＯＲＦの１つを含んでいるベクターの場合には、ベクターは更に、例えば、ＯＲＦに操作可能的に結合したプロモーターを含めて、調節配列を含んでいることができる。本発明のＥＭＦ及びＵＭＦを含むベクターでは、ベクターは更にＥＭＦ又はＵＭＦと操作可能的に結合したマーカー配列又は異種ＯＲＦを含んでいることができる。多数の適当なベクター及びプロモーターが当該技術分野の熟練者に知られており、そして本発明の組換え構築物を生成させるために商業的に入手することができる。次のベクターは例として提供する。細菌性: pＢs、ファージスクリプト、ＰsiＸ174、pＢluescript ＳＫ、pＢs ＫＳ、pＮＨ8a、pＮＨ16a、pＮＨ18a、p ＮＨ46a(Ｓtratagene); pＴrc99Ａ、pＫＫ223-3、pＫＫ233-3、pＤＲ540、pＲＩＴ6(Ｐharmacia)。真核生物性: pＷＬneo、pＳＶ2cat、pＯＧ44、pＸＴＩ、pＳＧ(Ｓtratagene); pＳＶＫ3、pＢＰＶ、pＭＳＧ、pＳＶＬ（Ｐharmacia）。プロモーター領域は、ＯＡＴ（クロラムフェニコールトランスフェラーゼ）ベクター又は選択性マーカーを有する他のベクターを使用して所望の遺伝子から選択することができる。２つの適当なベクターはpＫＫ232-8及びpＣＭ7である。特別に名前を挙げられる細菌性プロモーターにはlacＩ、lacＺ、Ｔ3、Ｔ7、gpt、ラムダＰ_E及びtrcが含まれる。真核生物性プロモーターにはＣＭＶ最初斯、ＨＳＶチミジンキナーゼ、初期及び後期ＳＶ40、レトロウイルスから得られるＬＴＲ並びにマウスメタロチオナイン-Ｉが含まれる。適当なベクターやプロモーターの選択は正に当該技術分野の通常の技倆レベルの範囲内である。本発明は更に、本発明のインフルエンザ菌Ｒdゲノムの単離フラグメントのいずれか１つを含有する宿主細胞を提供し、その際フラグメントは既知の形質転換方法を使用して宿主細胞中に導入されている。宿主細胞は、哺乳動物細胞のような高等真核生物宿主細胞、酵母細胞のような下等真核生物宿主細胞であることができるか、又は宿主細胞は細菌細胞のような原核細胞であることができる。宿主細胞中への組換え構築物の導入はリン酸カルシウムトランスフェクション、ＤＥＡＥ、デキストラン介在トランスフェクション又はトランスポーレーションによって実施することができる（Ｄavis，Ｌ.等、Ｂasic Ｍethods in Ｍolecular Ｂiology（1986年））。本発明のインフルエンザ菌Ｒdゲノムのフラグメントの１つを含有する宿主細胞を慣用の態様で使用して、単離フラグメントでコードされている遺伝子産生物を生じさせることができ（ＯＲＦの場合）るか又はＥＭＦの制御下で異種タンパク質を産生させるために使用することができる。本発明は更に、本発明の核酸フラグメントによって又は本発明の核酸フラグメントの縮重改変体によってコードされている単離ポリペプチドを提供する。「縮重改変体」によって、本発明の核酸フラグメント（例えば、ＯＲＦ）とヌクレオチド配列が異なっているが、遺伝暗号の縮重により同一のポリペプチド配列をコードしているヌクレオチドフラグメントが意図される。本発明の好ましい核酸フラグメントはタンパク質をコードしている表1（ａ）に示されたＯＲＦである。当該技術分野で知られている多様な方法論を使用して本発明の単離ポリペプチド又はタンパク質のいずれか１つを取得することができる。最も簡単なレベルでは、市販で入手可能なペチド合成器を使用してアミノ酸配列を合成することができる。これは、より大きいポリペプチドの小さいペプチドやフラグメントの製造で特に有用である。フラグメントは、例えば、天然のポリペプチドに対する抗体の産生で有用である。代替的方法では、ポリペプチド又はタンパク質はポリペプチド又はタンパク質を天然に産生する細菌細胞から精製される。当該技術分野の熟練者はポリペプチドやタンパク質を単離する既知の方法に容易に従って本発明の単離ポリペプチド又はタンパク質の１つを取得することができる。これらにはイムノクロマトグラフィー、ＨＰＬＣ、サイズ排除クロマトグラフィー、イオン交換クロマトグラフィー及びイムノアフィニティクロマトグラフィーが含まれるが、これらに限定されない。本発明のポリペプチドやタンパク質は代替的に、所望のポリペプチド又はタンパク質を発環するように改変されている細胞から精製することができる。本明細書で使用するとき、細胞が通常は産生しないか又は細胞が通常はより低いレベルで産生するポリペプチド又はタンパク質を細胞が産生するように遺伝子操作でなされているとき、この細胞は所望のポリペプチド又はタンパクを発現するように改変されていると言う。当該技術分野の熟練者は、本発明のポリペプチド又はタンパク質の１つを産生する細胞を作るために、真核又は原核細胞中に組換え配列か又は合成配列のどちらかを導入しそして発現させる方法を容易に適合させることができる。任意の宿主／ベクター系を使用して、本発明の１つ又はそれより多いＯＲＦを発現させることができる。これらには真核生物宿主、例えば、ＨeＬa細胞、Ｃv- １細胞、ＣＯＳ細胞及びＳfs細胞、並びに原核生物宿主、例えば大腸菌及び枯草菌が含まれるが、これらに限定されない。最も好ましい細胞は、通常は特定のポリペプチド若しくはタンパク質を発現しないか若しくはポリペプチド又はタンパク質を低い天然レベルで発現する細胞である。本明細書で使用するとき、「組換え体」は、ポリペプチド又はタンパク質が組換え体（例えば、微生物又は哺乳動物）の発現系から誘導されることを意味する。「微生物」とは、細菌又は真菌（例えば、酵母）発現系で製造される組換えポリペフチド又はタンパク質を言う。生成物として、「組換え微生物」は、天然の内在性物質を本質的に含有せずそして関連する天然のグリコシル化を伴わないポリペプチド又はタンパク質を明示する。大部分の細菌培養物、例えば大腸菌内で発現されるポリペプチド又はタンパク質はグリコシル化修飾を有していない; 酵母内で発現されるポリペプチド又はタンパク質は哺乳動物細胞で発現されるものと異なるグリコシル化パターンを有している。「ヌクレオチド配列」とはデオキシリボヌクレオチドのヘテロポリマーを言う。一般的に、本発明で提供されるポリペプチド及びタンパク質をコードするＤＮＡ断片は、微生物又はウイルスオペロンから誘導される調節要素を含んでいる組換え体転写ユニット中で発現され得る合成遺伝子を堤供するために、インフルエンザ菌Ｒdゲノムのフラグメントと短いオリゴヌクレオチドリンカーから、又は一連のオリゴヌクレオチドから組み立てられている。「組換え発現媒体又はベクター」とは、ＤＮＡ（ＲＮＡ）配列からポリペプチドを発現させるためのプラスミド又はファージ又はウイルス又はベクターを言う。発現媒体は、（１）遺伝子発現で調節の役割を有している１つ又は複数の遺伝子要素、例えば、プロモーター又はエンハンサー、（２）ｍＲＮＡに転写されそしてタンパク質に翻訳される構造又はコード化配列、及び（３）適当な転写開始及び終止配列のアッセンブリーを含む転写ユニットを含んでいることができる。酵母又は真核生物発現系で使用するように意図された構造ユニットは好ましくは、宿主細胞で翻訳されたタンパク質の細胞外分泌を可能にするリーダー配列を含有している。或いは、組換えタンパク質がリーダー又は輸送配列なしで発現される場合、これはＮ末端メチオニン残基を含有することができる。この残基は最終生成物を提供するために発現された組換えタンパク質からその後開裂することができるか又はできない。「組換え発現系」は、染色体ＤＮＡ内に安定的に組換え転写ユニットを組み込んでいるか又は組換え転写ユニットを染色体外に有している宿主細胞を意味する。細胞は原核又は真核細胞であることができる。本明細書で特定した組換え発現系は、発現されるＤＮＡ断片又は合成遺伝子に結合した調節要素の誘導に基づいて異種ポリペプチド又はタンパク質を発現する。成熟タンパク質は、適当なプロモーターの制御下で哺乳動物細胞、酵母、細菌又は他の細胞内で発現することができる。細胞不含翻訳系を使用し、本発明のＤＮＡ構築物から誘導されるＲＮＡを使用してこのようなタンパク質を製造することもできる。原核生物及び真核生物宿主で使用するのに適するクローニング及び発現ベクターは、サムブルック（Ｓambrook）等、モレキュラークローニング: 実験室マニュアル(Ｍolecular Ｃloning: ＡＬaborato ry Ｍanual)、第２版、コールドスプリングハーバー(Ｃold Ｓpring Ｈarbor)、ニューヨーク（198年）（この開示は参照として本明細書に組み入れる）によって記載されている。一般的に、組換え発現ベクターは宿主細胞の形質転換を可能にする複製起源及び選択マーカー、例えば大腸菌のアンビシリン耐性遺伝子及びＳ-セレビシエＴＲＰ1遺伝子並びに高度発現遺伝子から誘導され下流の構造配列の転写を指令するプロモーターを含有している。このようなプロモーターは、なかんずく、3-ホスホグリセレートキナーゼ（ＰＧＫ）、ａ因子、酸ホスファターゼ又は熱ショックタンパク質のような解糖酵素をコードしているオペロンから誘導することができる。異種構造配列は翻訳開始及び終止配列、（そして好ましくは、細胞周辺腔又は細胞外媒体中へ翻訳タンパク質の分泌を指令し得るリーダー配列と共に適当な相で組み立てられる。任意に、異種配列は、所望の特徴、例えば発現された組換え産生物を安定化し又は精製を単純化するＮ末端同定ペプチドを含有する融合タンパク質をコードすることができる。細菌で使用するのに有用な発現ベクターは、機能プロモーターを有する操作可能な読み取り相内で、適当な翻訳開始及び終止シグナルと一緒に所望のタンパク質をコードする構造ＤＮＡ配列を挿入することによって構築される。ベクターは、ベクターの維持を確保するために、そして所望の場合、宿主内での増幅を堤供するために、１つ又はそれより多い表現型の選択マーカー及び複製起源を含んでいる。形質転換用の適当な原核生物宿主には大腸菌、枯草菌、ネズミチフス菌並びにシュードモナス、ストレプトミセス及びスタフィロコッカス属内の種々の種が含まれるが、他のものを選択対象として使用することもできる。代表的な実施例として、細菌で使用するのに有用な発現ベクターは選択マーカー及び周知のクローニングベクターｐＢＲ322（ＡＴＣＣ 37017）の遺伝子要素を含んでいる市販で入手可能なプラスミドから誘導される細菌性複製起源を含むことができるが、こ例に限定されない。このような市販のベクターには、例えば、ｐＫＫ223-3（Ｐharmacia Ｆine Ｃhemicals、スウェーデン国ウフサラ）及びＧＥＭ１（Ｐromega Ｂiotec、米国ウィスコンシン州マディソン）が含まれる。これらのｐＢＲ322「バックボーン」切片は適当なプロモーター及び発現すべき構造配列と共に組合せられる。適当な宿主株を形質転換しそして適当な細胞密度まで宿主株を増殖させた後、選択されたプロモーターを適当な手段（例えば、温度シフト又は化学的誘導）によって活性化し、そして細胞を更なる期間培養する。細胞は典型的には遠心によって採集し、物理的又は化学的手段によって破壊し、そして得られた粗製抽出物は更に精製するために維持する。種々の哺乳動物細胞培養系を使用して組換えタンパク質を発現することもできる。哺乳動物発現系の例には、グルズマン(Ｇluzman)、セル（Ｃell）23:175（1 981年）によって記載されたサル腎臓線維芽細胞のＣＯＡ-7株並びに適合可能なベクターを発現し得る他の細胞株、例えば、Ｃ127、3Ｔ3、ＣＨＯ、ＨeＬa及びＢＨＫ細胞株が含まれる。哺乳動物発現ベクターは複製起源、適当なプロモーター及びエンハンサー、そして更には必要なリボソーム結合部位、ポリアデニル化部位、スプライスドナー及びアクセプター部位、転写終止配列、並びに5'側部非転写配列を含んでいよう。ＳＶ40ウイルスゲノムから誘導されるＤＮＡ配列、例えばＳＶ40起源、初期プロモーター、エンハンサー、スプライス及びポリアデニル化部位を使用して必要な非転写遺伝子要素を堤供することができる。細菌培養物内で産生される組換えポリペプチド及びタンパク質は通常、１回又はそれより多い塩析、水性イオン交換又はサイズ排除クロマトグラフィー段階の前に細胞ペレットから初期抽出して単離される。成熟タンパク質の配置を完了させる際には、必要な場合、タンパク質の再生段階を使用することができる。最後に、最終精製段階用に高性能液体クロマトグラフィー（ＨＰＬＣ）を使用することができる。タンパク質の発現で使用された微生物細胞は、凍結−解凍サイクル、超音波処理、機械的破壊又は細胞溶解の使用を含む任意の好都合な方法によっ破壊することができる。本発明は更に、本明細書に記載したものと実質的に同等な単離ポリペプチド、タンパク質及び核酸分子を含む。本明細書で使用するとき、実質的に同等とは、１つ又はそれより多い置換、欠失又は付加によって参照配列と異なっておりそしてその最終的な効果が参照配列と対象配列間で不利な機能的相違をもたらしていない核酸及びアミノ酸配列の両方、例えば、突然変異配列を言うことができる。本発明の目的では、同等な生物学的活性及び同等な発現特徴を有する配列は実質的に同等と考えられる。同等性決定の目的では、成熟配列の一部切断は無視すべきである。本発明は更に、本発明のインフルエンザ菌Ｒdゲノムのフラグメントの、インフルエンザ菌の他の株から得られる相同体及び本発明のＯＲＦによってコードされているタンパク質の相同体を取得する方法を提供する。本明細書で使用するとき、インフルエンザ菌の配列又はタンパク質が本発明のインフルエンザ菌Ｒdゲノムのフラグメントの１つ又は本発明のＯＲＦの１つによってコードされているタンパク賀と顕著な相同性を共有している場合、これは本発明のインフルエンザ菌Ｒdゲノムのフラグメント又はＯＲＦの１つによってコードされているタンパク質の相同体として特定される。詳細には、本明細書で開示した配列をプローブ又はプライマーとして使用し、そしてＰＣＲクローニング法及びコロニー／プラークハイブリッド形成法のような技術を使用して、当該技術分野の熟練者は相同体を取得することができる。本明細書で使用するとき、２つの核酸分子又はタンパク質は、これら２つが85 ％を超える配列（アミノ酸又は核酸）相同性を有する領域を含有している場合、「顕著な相同性を共有する」と言う。配列識別番号:１で堤供されたヌクレオチド配列又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列から誘導される領域特異的プライマー又はプローブを使用して、ＤＮＡ合成及びＲＮＡ増幅をプライムし、そして既知の方怯（Ｉnnis等、ＰＣＲＰrotocols、Ａcademic Ｐress、カリフォルニア州サンディエゴ（1990年））を使用して相同体をコードするクローン化したＤＮＡを含有するコロニーを同定することができる。配列識別番号:１又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列から誘導されるプライマーを使用するとき、当該技術分野の熟練者は、高い緊縮条件（例えば、50〜60℃でのアニーリング）を用いることによって、このプライマーと75％以上相同な配列しか増幅されないことを認識しているであろう。より低い緊縮条件（例えば、35〜37℃でのアニーリング）を使用することによって、このプライマーと40〜50％以上相同な配列も増幅されるであろう。コロニー／プラークのハイブリッド形成用に、配列識別番号:１又は配列識別番号:１と少なくとも99.9％同一のヌクレオチド配列から誘導されるＤＮＡプローブを使用するとき、当該技術分野の熟練者は高い緊縮条件（例えば、５ＸＳＳＰＣ及び50％ホルムアミド中50〜65℃でのハイブリッド形成並びに0.5ＸＳＳＰＣ中50〜65℃での洗浄）を使用することによって、このプローブと90％以上相同な領域を有する配列を取得することができ、そしてより低い緊縮条件（例えば、５ＸＳＳＰＣ及び40〜45％ホルムアミドでのハイブリッド形成並びにＳＳＰＣ中42℃での洗浄）を使用することによって、このプローブと35〜45％以上相同な領域を有する配列が取得されることを認識しているであろう。生物がこのようなタンパク質を天然に発現するか又はこのようなタンパク質をコードしている遺伝子を含有している限り、本発明の相同体の供給源として任意の生物を使用することができる。相同体を単離するための最も好ましい生物は，インフルエンザ菌Ｒdと密接に関係のある細菌である。本発明の組成物の使用表１(ａ)で提供された各ＯＲＦはリレイエム．（Ｒiley，Ｍ.）（Ｍicrobiol ogy Ｒeviews 57(4):862(1993年））から書きかえた102の生物学的役割カテゴリーの１つに割り当てた。これによって熟練技術者は同定された各コード化配列の用途を決定することができる。表１(ａ)は更に、各ＯＲＦによってコードされるペプチドのタイプの確認を提供している。その結果、当該技術分野の熟練者は本発明のポリペプチドを、推定上のポリペフチド同定のタイプと一致した商業的、治療的及び産業的目的に使用することができる。このような同定によって、当該技術分野の熟練者はインフルエンザ菌ＯＲＦを、同定されている配列の既知のタイプと類似した態様で; 例えば、特定の蔗糖供給源を発酵させるために又は特定の代謝物を産生させるために使用することができる。（営利産業内で使用される酵素の総説については、Ｂioohem ical Ｅngineering and Ｂiotechnology Ｈandbook 第２版、Ｍacmillan Ｐubl ．Ｌtd.、ニューヨーク（1991年）及びＢiocatalysts in Ｏrganic Ｓyntheses 、編集Ｊ．Ｔramper等、Ｅlsevier Ｓcience Ｐublishers、オランダ国アムステルダム（1985年）参照）。１．生合成酵素代謝の中間産生物の分解に関与する酵素、中心的な中間代謝に関与する酵素、呼吸に関与する酵素、発酵に関与する好気性及び嫌気性の両酵素、ＡＴＰプロトン駆動力変換に関与する酵素、広範囲の調節機能に関与する酵素、アミノ酸合成に関与する酵素、ヌクレオチド合成に関与する酵素、補因子及びビタミン合成に関与する酵素を含めて、中間的及び巨大分子代謝に係わる触媒反応、小分子の生合成、細胞過程及び他の機能の介在に関与するタンパク質をコードしている読み取り枠を産業的な生合成用に使用することができる。ヘモフィルス属に存在する種々の代謝経路は、絶対栄養要件に基づいて、並びに表１(ａ)で確認される種々の酵素の試験によって同定することができる。中間代謝のカテゴリー内で同定され、表１(ａ)で確認されるＯＲＦによってコードされている多数のタンパク質は中間代謝物の分解並びに非巨大分子代謝に特に関与している。同定された酵素の幾つかにはアミラーゼ、グルコース、オキシダーゼ及びカタラーゼが含まれる。タンパク質分解酵素は商業的に重要なもう１つのクラスの酵素である。タンパク質分解酵素は、亜麻及び他の植物繊維の加工を含む多数の産業的処理、果汁の抽出、清澄化及びペクチン除去、植物油の抽出並びに単細胞果実を得るために果実及び植物の柔軟化での用途が見い出されている。食品産業で使用されるタンパク質分解酵素の詳細な総説はロムボウツ（Ｒombouts）等（Ｓymbiosis 21:79（1 986））及びボラーゲン（Ｖoragen）等（Ｂiocatalyst in Ａgricultural Ｂiot echnology、編集Ｊ.Ｒ．Ｗhitater等、Ａmerican Ｃhemical Ｓociety Ｓympos ium Ｓeries 389:93（1989年））によって提供されている。グルコース、ガラクトース、フルトース及びキシロースの代謝はヘモフィルス属の初期代謝の重要な部分である。これらの糖類の分解に関与する酵素は産業的な発酵で使用することができる。商業的な観点から重要な糖変換酵素の幾つかにはグルコースイソメラーゼのような糖イソメラーゼが含まれる。ケトグロン酸（ＫＧＡ）を産生するグルコースオキシダーゼのような他の代謝酵素に商業的な用途が見い出されている。ＫＧＡは、ライヒシュタイン（Ｒeichstein）の方法（Ｋr ueger等、Ｂiotechnology ６(Ａ)、Ｒhine，Ｈ.Ｊ.編集、Ｖerlag Ｐress、ドイツ国バインハイム（1984年）参照）を使用するアスコルビン酸の商業的製造における中間体である。グルコースオキシダーゼ（ＧＯＤ）は市販で入手可能であり、そしてビールの酸素除去用に精製形態並びに固定形態で使用されている。ハルトマイヤー（Ｈar tmeir）等、（Ｂiotechnology Letters １: 21（1979年）参照。最も重要なＧＯＤの適用はグルコン酸の産業的規模の発酵である。グルコン酸を使用する市場は洗剤、織物、成革、写真、医薬品、食品、飼料及びコンクリート産業である（Ｂ igelis、Ｇene Ｍanipulations and Ｆungi、Ｂenett Ｊ.Ｗ.等編集、Ａcademic Ｐress、ニューヨーク(1985年)、357頁参照)。産業的な適用に加えて、ＧＯＤは体液中のグルコースの定量測定用の医薬品、最近は澱粉及びセルロース加水分解物から得られるシロップを分析するバイオテクノロジーにおいて適用が見い出されている。オウス（Ｏwusu）等（Ｂiochem．at Ｂiophysics．Ａcta．872:83 （1986年））参照。世界で今日使用されている主要な甘味料はテンサイやサトウキビから得られる蔗糖である。産業的酵素の分野では、グルコースイソメラーゼ方法が今日の市場で最も大きい広がりを示している。最初、可溶性酵素が使用され、そしてその後固定酵素が開発された（Ｋrueger等、Ｂiotechnology、Ｔhe Ｔextbook of Ｉnd ustrial Ｍicrobiology、Ｓinauer Ａssociated Ｉncorporated、マサチューセッツ州サンダーランド（1890年））。今日、固定酵素を使用してグルコースから製造される高フルクトースシロップの使用は断然大きい産業的事業である。これら酵素の産業的使用の総説はジョルゲンセン（Ｊorg ensen）（Ｓtarch 40:307（1988年））によって堤供されている。プロテイナーゼ、例えばアルカリ性セリンプロテイナーゼは洗剤添加物として使用されるので、産業的分野で使用される最も大きい微生物酵素量の１つを表わしている。それらが産業的に重要であるため、産業的方法でのこれら酵素の使用に関する発表及び未発表情報が多数存在している。（Ｆaultman等、Ａcid Ｐrot eassa Ｓtructure Ｆunction and Ｂiology、Ｔang，Ｊ.編集、Ｐlenum Ｐress 、ニューヨーク（1977年）及びＧodfrey等、Ｉndustrial Ｅnzymes、ＭacＭilla n Ｐublishers、英国サリー（1983年）及びＨepner等、Ｒeport Ｉndustrial Ｅ nzymes by 1990、Ｈel Ｈepner & Ａssociates、ロンドン（1986年）参照）。本発明のもう１つのクラスの商業的に使用可能なタンパク質は表１で確認される微生物リパーゼである(Ｍacrae等、Ｐhilosophical Ｔransactions of the Ｃ hirai Ｓociety of Ｌondon 310:227（1985年）及びＰoserke、Ｊournal of the Ａmerican Ｏil Ｃhemist Ｓociety 61:1758（1984年）参照)。リパーゼは、容易に入手できるトリグリセリドのリパーゼ触媒相互エステル化を使用して中性グリセリドを製造する油脂産業で主として使用されている。リパーゼの適用には、洗浄方法中に織物から脂肪の除去を容易にする洗剤添加物としての使用が含まれる。複雑な有機分子の合成で主要段階の触媒として酵素、そして特に微生物酵素を使用することは非常な速度で人気を得てきている。非常に重要な１つの分野はキラル中間体の製造である。キラル中間体の製造は広範囲の合成化学者、特に新薬、農薬、芳香剤及び風味剤の製造に関与している科学者にとって重要である。( Ｄavies 等、Ｒecent Ａdvances in the Ｇeneration of Ｃhiral Ｉntermediat es Ｕsing Ｅnzymes、ＣＲＣＰress、フロリダ州ポカレートン（1990年）参照) 。酵素によって触媒される以下の反応は有機化学者にとって重要である: カルボン酸エステル、リン酸エステル、アミド及びニトリルの加水分解、エステル化反応、トランス-エステル化反応、アミドの合成、アルカノン及びオキソアルカネートの還元、アルコールからカルボニル化合物への酸化、スルフィドからスルホキシドへの酸化、並びにアルドール反応のような炭素結合形成反応。バイオトランスフォーメーションや有機合成に対して本発明のＯＲＦの１つによってコードされている酵素の使用を考えるとき、単離酵素とは全く異なって、微生物使用の利点及び不利益をそれぞれ考慮することが時には必要となる。一方では全細胞系、又は他方では部分的に精製して単離された酵素を使用することの賛否両論がブッド（Ｂud）等（Ｃhemistry in Ｂ ritain(1987年)、127頁）によって詳細に記載されている。アミノ酸の生合成及び代謝に関与する酵素、アミノトランスフェラーゼはアミノ酸の触媒的製造で有用である。微生物に基づく酵素系を使用する利点はアミノトランスフェラーゼ酵素がl-アミノ酸だけの立体選択的合成を触媒しそして一般的に一様に高い触媒速度を有していることである。アミノ酸製造に対してアミノトランスフェラーゼを使用する説明はロゼル・デービッド（Ｒoselle-Ｄavid）（Ｍethods of Ｅnzymology 136:479（1987年））によって提供されている。本発明のＯＲＦによってコードされている有用なタンパク質のもう１つの有用なカテゴリーには核酸合成、修復及び組換えに関与する酵素が含まれる。商業的に重要な多様な酵素がこれまでにヘモフィルス種のメンバーから単離されている。これらには、Ｈinc II、Ｈind III及びＨinf I制限エンドヌクレアーゼが含まれる。表１(ａ)で、バイオテクノロジー産業で直接的な用途を有している制限酵素、リガーゼ、ギラーゼ及びメチラーゼのような広範な酵素列が確認される。２．抗体の産生本明細書に記載したように、本発明のタンパク質、並びにそれらの相同体は、現在他のタンパク質に対して適用されており当該技術分野で知られている多様な手順や方法で使用することができる。本発明のタンパク質は更に、タンパク質に選択的に結合する抗体を生成させるために使用することができる。このような抗体はモノクローナル抗体か又はポリクローナル抗体のいずれか、並びにこれら抗体のアラグメント及びヒト型化形態であることができる。本発明は更に、本発明のタンパク質の１つに選択的に結合する抗体及びこれら抗体を産生するハイブリドーマを提供する。ハイブリドーマは、特異的なモノクローナル抗体を分泌し得る不死化細胞株である。一般的に、ポリクローナル及びモノクローナル抗体並びに所望の抗体を産生し得るハイブリドーマを作成する技術は当該技術分野で周知である(Ｃampbell，Ａ .Ｍ.、Ｍonoolonal Ａntibody Ｔechnology: Ｌaboratory Ｔechniques in Ｂio chemistry and Ｍolecular Ｂiology、Ｅlsevier Ｓcience Ｐublishers、オランダ国アムステルダム(1984年); Ｓt．Ｇroth等、Ｊ.Ｉmmunol．Ｍethods 35: １〜21(1980年); Ｋohler及びＭilstein、Ｎature 256:495〜497(1975年))、トリオーマ技術、ヒトＢ細胞ハイブリドーマ技術（Ｋozbor等、Ｉmmunology Ｔoda y 4:72(1983年); Ｃole等、Ｍonoclonal Ａntibodies and Ｃancer Ｔherapy、Ａlan Ｒ．Ｌiss，Ｉnc．(1985年)、77〜96頁）。抗体を産生することが知られている動物（マウス、ウサギ等）を偽遺伝子ポリペプチドで免疫化することができる。免疫法は当該技術分野で周知である。このような方法には、ポリペプチドの皮下又は腹腔内注射が含まれる。当該技術分野の熟練者は、免疫法で使用される本発明のＯＲＦによってコードされているタンパク質の量は免疫化される動物、ペプチドの抗原性及び注射部位に基づいて変動することを認識しているであろう。免疫原として使用されるタンパク質は、タンパク質の抗原性を高めるために修正するか又はアジュバント中で投与することができる。タンパク質の抗原性を高める方法は当該技術分野で周知であり、そしてこれらには抗原と異種タンパク質（例えば、グロブリン又はβ-ガラクトシダーゼ）との結合又は免疫法中にアジュバントを含めることが含まれるが、これらに限定されない。モノクローナル抗体に関しては、免疫化した動物から脾臓細胞を取り出し、ミエローマ細胞、例えばＳＰ2/0-Ａg14ミエローマ細胞と融合させ、そしてモノクローナル抗体産生ハイブリドーマ細胞とすることができる。当該技術分野で周知の多数の方法のうちの任意の１つを使用して、所望の特徴を有する抗体を産生するハイブリドーマ細胞を同定することができる。これらにはエリザアッセイ、ウエスタンブロット分析又はラジオイムノアッセイによるハイブリドーマのスクリーニングが含まれる（Ｌutz等、Ｅxp．Ｃell Ｒe s．175:109〜124（1988年）。所望の抗体を分泌するハイブリドーマをクローン化しそして当該技術分野で知られている方法を使用してクラス及びサブクラスを決定する(Ｃampbell，Ａ.Ｍ. 、Ｍonoclonal Ａntibody Ｔechnology: Ｌaboratory Ｔechniques in Ｂiochem istry and Ｍolecular Ｂiology、Ｅlsevier Ｓcience Ｐublishers、オランダ国アムステルダム（1984年）)。１本鎖抗体を製造するために記載された方法（米国特許 4,946,778）を、本発明のタンパク質に対する１本鎖抗体を製造するように適合させることができる。ポリクローナル抗体に関しては、抗体を含有する抗血清を免疫化した動物から単離し、そして上記した方法の１つを使用して所望の特異性を有する抗体の存在についてスクリーニングする。本発明は更に、上記した抗体を検出できるように標識された形態で提供する。抗体は放射性同位体、アフィニティ標識(例えば、ビオチン、アビジン等)、酵素標識(例えば、ホースラディッシュペルオキシダーゼ、アルカリホスファターゼ等)、蛍光標識(例えば、ＦＩＴＣ又はローダミン等)、常磁性体原子等を使用して検出できるように標識することができる。このような標識化を達成する方法は当該技術分野で周知であり、例えば（Ｓternberger，Ｌ.Ａ.等、Ｊ．Ｈistochem ．Ｃytochem．18:315(1970年); Ｂayer，Ｅ.Ａ.等、Ｍeth．Ｅnzym．62:308(197 8年); Ｅngval，Ｅ.等、Ｉmmunol．109:129(1972年); Ｇoding，Ｊ.Ｗ.、Ｊ．Ｉ mmunol．Ｍeth．13:215（1976年））を参照されたい。本発明の標識抗体は、インフルエンザ菌Ｒdゲノムのフラグメントが発現されている細胞又は組織を同定するためにインビトロ、インビボ及びインシトウアッセイで使用することができる。本発明は更に、固形支持体に固定した上記抗体を提供する。このような固形支持体の例にはポリカーボネートのようなプラスチック、アガロースやセファロースのような複雑な炭水化物、ポリアクリルアミドのようなアクリル樹脂並びにラテックスビーズが含まれる。抗体をこのような固形支持体に結合する技術は当該技術分野で周知である(Ｗeir，Ｄ.Ｍ.等、「Ｈandbook of Ｅxperiment al Ｉmmunlogy」第４版、Ｂlackwell Ｓcientific Ｐublications、英国オックスフォード、10章(1986年); Ｊacoby，Ｗ.Ｄ.等、Ｍeth．Ｅnzym．34 Ａcademic Ｐress、ニューヨーク（1974年）)。本発明の固定抗体は、本発明のタンパク質のインビトロ、インビボ及びインシトウアッセイ並びにイムノアフィニティ精製法で使用することができる。３．診断アッセイ及びキット本発明は更に、本発明のＤＦ又は抗体の１つを使用して、試験試料中における本発明のＯＲＦの１つ又はその相同体の発現を同定する方法を提供する。詳細には、このような方法は試験試料を１つ若しくはそれより多い抗体又は１つ若しくはそれより多い本発明のＤＦと共にインキュベートし、そしてＤＦ又は抗体と試験試料内の成分との結合をアッセイすることを含んでいる。ＤＦ又は抗体を試験試料と共にインキュベートする条件は変動する。インキュベーション条件はアッセイで使用される様式、使用される検出方法及びアッセイで使用されるＤＦ又は抗体のタイプや性質に依存する。当該技術分野の熟練者は、通常利用可能なハイブリッド形成、増幅又は免疫学的アッセイ様式の任意のものを、本発明のＤＦ又は抗体を使用するために容易に適合させ得ることを認識しているであろう。このようなアッセイの例はチャードティ．(Ｃhard，Ｔ.)(Ａn Ｉntroduction to Ｒadioimmunoassay and Ｒelated Ｔechniques、Ｅlsevier Ｓcience Ｐublishers、オランダ国アムステルダム（1986年）); バロックジー．アール．(Ｂullock，Ｇ.Ｒ.)等（Ｔechniques in Ｉmmunocytochemistry、Ａc ademic Ｐress、フロリダ州オーランド、１巻(1982年)、２巻(1983年)、３巻（1 985年）); ティーセンピー．（Ｔijssen，Ｐ.）（Ｐractice and Ｔheory of Ｅnzyme Ｉmmunoassays: Ｌaboratory Ｔechiques in Ｂiochemistry and Ｍole cular Ｂiology、Ｅlsevier Ｓcience Ｐublishers、オランダ国アムステルダム（1985年））中に見ることができる。本発明の試験試料には細胞、細胞のタンパク質若しくは膜抽出物、又は生物学的液体、例えば痰、血液、血清、血漿若しくは尿が含まれる。上記した方油で使用される試験試料はアッセイ様式、検出方法の性質、及びアッセイすべき試料として使用される組織、細胞又は抽出物に基づいて変動する。細胞のタンパク質抽出物又は膜抽出物の調製方法は当該技術分野で周知であり、そして使用される系と適合し得る試料を得るために容易に適合させることができる。本発明のもう１つの実施態様では、本発明のアッセイを実施するのに必要な試薬を含有するキットが提供される。詳細には、本発明は（ａ）本発明のＤＦ又は抗体の１つを含んでいる第１の容器; 及び（ｂ）１つ又はそれより多い次の試薬: 洗浄試薬、結合したＤＦ又は抗体の存在を検出し得る試薬を含んでいる１つ又はそれより多い他の容器、を含んでいる１つ又はそれより多い容器を、近接して閉じ込めて受け入れる隔室キットを提供する。詳細には、隔室キットには試薬が別々の容器に入れられている任意のキットが含まれる。このような容器には小さいガラス容器、プラスチック容器又はプラスチック片若しくは紙片が含まれる。このような容器によって、試料と試薬が互いに夾雑しないように試楽を１つの隔室からもう１つの隔室に効率的に輸送することができ、そして各容器の試薬又は溶液を１つの隔室からもう１つの隔室に定量的態様で添加することができる。このような容器には、試験試料を受け入れる容器、アッセイで使用される抗体を含有する容器、洗浄試薬（例えば、リン酸緩衝食塩液、トリス緩衝液等）を含有する容器及び結合した抗体又はＤＦを検出するために使用される試薬を含有する容器が含まれるであろう。検出試薬のタイプには、標識核酸プローブ、標識した第２の抗体、若し〈は二者択一的に、第１の抗体が標識されている場合、酵素、又は標識抗体と反応し得る抗体結合試薬が含まれる。当該技術分野の熟練者は、本発明で開示されたＤＦ及び抗体が当該技術分野で周知の確立されたキット様式の１つに容易に組み込まれ得ることを容易に認識するであろう。４．結合物質のスクリーニングアッセイ本発明の単離タンパク質を使用して、本発明は更に、本発明のＯＲＦの１つによってコードされているタンパク質又は本明細書で記載したヘモフィルス属ゲノムのフラグメントの１つと結合する物質を取得しそして同定する方法を提供する。詳細には、上記方法は次の段階を含んでいる：（ａ）物質を、本発明のＯＲＦの１つによってコードされている単離タンパク質又はヘモフィルス属ゲノムの単離フラグメントと接触させ; そして（ｂ）上記物質が上記タンパク質又は上記フラグメントと結合するかどうかを測定する。上記アッセイでスクリーニングされる物質は、ペプチド、炭水化物、ビタミン誘導体又は他の医薬品であることができるが、これらに限定されない。これらの物質は無作為に選択しそしてスクリーニングすることができるか又はタンパク質モデル化技術を使用して合理的に選択若しくは設計することができる。無作為スクリーニングでは、ペプチド、炭水化物、医薬品等のような物質が無作為に選択されそしてそれらが本発明のＯＲＦによってコードされているタンパク質と結合する能力についてアッセイされる。或いは、物質は合理的に選択又は設計することができる。本明細書で使用するとき、物質が特別のタンパク質の配置に基づいて選択されるとき、この物質は「合理的に選択又は設計する」と言う。例えば、当該技術分野の熟練者は現在利用可能な方法を、特異的なペプチド配列と結合し得るペプチド、医薬品等を生成するように容易に適合させて、合理的に設計された抗ペプチドペプチド（例えば、Ｈurby等、Ａpplication of Ｓynthetio Ｐeptides: Ａntisense Ｐeptides，Ｉ n Ｓynthetic Ｐeptides，ＡＵser's Ｇuide、Ｗ.Ｈ.Ｆreeman、ニューヨーク( 1992年)、288〜307頁、及びＫaspczak等、Ｂio chemistry 28:9230〜9238（1989年）参照）又は医薬品等を生成させることができる。上記に加えて、広範に記載した本発明の物質の１つのクラスを使用して、本発明のＯＲＦ又はＥＭＦの１つと結合させて遺伝子発現を制御することができる。上記したように、このような物質は無作為にスクリーニングするか又は合理的に設計／選択することができる。ＯＲＦ又はＥＭＦを標的化することによって、熟練技術者は、発現制御に関して同一のＥＭＦに依存している単一のＯＲＦか又は複数のＯＲＦのどちらかの発現を調節する配列特異的又は要素特異的物質を設計することができる。ＤＮＡ結合物質の１つのクラスは、ＤＮＡ又はＲＮＡと結合することによってハイブリッドを形成するか又は三重らせん形態を形成する塩基残基を含有している物質である。このような物質は古典的なホスホジエステル、リポ核酸パックボーンに基づいていることができるか又は塩基結合能力を有する多様なスルフヒドリル若しくはポリマー誘導体であることができる。これらの方法で使用するのに適する物質は通常20から40塩基を有しており、そして転写に関する遺伝子領域（三重らせん − Ｌee等、Ｎucl．Ａcids Ｒes．6: 3073(1979年); Ｃooney等、Ｓcience 241:456(1988年); 及びＤervan等、Ｓcien ce 251:1360（1991年）参照）又はｍＲＮＡ自体（アンチセンス − Ｏkano、Ｊ．Ｎeurochem．56:560(1991年); Ｏligodeoxynucleotides as Ａntisense Ｉnhi bitors of Ｇene Ｅxpression、ＣＲＣＰress、フロリダ州ボカレートン（1988 年））に相補的であるように設計される。三重らせん形成はＤＮＡからＲＮＡの転写を停止させ、一方アンチセンスＲＮＡハイブリッド形成はｍＲＮＡ分子のポリペプチドへの翻訳を遮断する。両技術共モデルシステムで有効であることが証明されている。本発明の配列に含まれている情報はアンチセンス又は三重らせんオリゴヌクレオチド及び他のＤＮＡ結合物質を設計するために必要である。本発明のＯＲＦの１つによってコードされているタンパク質と結合する物質は、ＯＲＦによってコードされているタンパク質の活性を調節することによって細菌感染を制御させる際に、診断薬として使用することができる。本発明のＯＲＦの１つによってコードされているタンパク質と結合する物質を既知の技術を使用して処方して、ヘモフィルス属の増殖及び感染の制御で使用する医療品組成物を製造することができる。５．ワクチン及び医薬品組成物本発明は更に、インフルエンザ菌又は別の関連生物の増殖をインビボ又はインビトロで調節するために使用できる医薬品を提供する。本明細書で使用するとき、「医薬品」は、医薬品組成物を提供するために既知の技術を使用して処方できる組成物として定義される。本明細書で使用するとき、「本発明の医薬品」とは、本発明のＯＲＹによってコードされているタンパク質から誘導されるか又は本明細書で記載したアッセイを使用して同定される物質である医薬品を言う。本明細書で使用するとき、医薬品が問題の生物の増殖速度、分裂速度又は生存性を減少させるとき、その医薬品は「ヘモフィルス種又は関連生物の増殖をインビボ又はインピトロで調節する」と言われる。本発明の医薬品の使用を実施するには作用の基礎になっているメカニズムを理解する必要はないが、本発明の医薬品は多数の態様で生物の増殖を調節することができる。医薬品のなかには重要なタンパク質と結合し、その結果タンパク質の生物学的活性を遮断することによって増殖を調節するものもあり、一方他の医薬品は生物の外部表面の成分と結合して付着を遮断するか又は生物が多数の天然の免疫系と一層作用し易いようにすることがある。或いは、医薬品は本発明のＯＲＦの１つによってコードされているタンパク質を含みそしてワクチンとして役立つことができる。外層膜成分、例えばＬＰＳに基づくワクチンの開発及び使用は当該技術分野で周知である。本明細書で使用するとき、「関連生物」とは、本発明の医薬品の１つによって増殖を調節し得る任意の生物を言う。一般的には、このような生物はワクチンとして使用される医薬品又はタンパク質の標的であるタンパク質の相同体を含有する。このようなものとして、関連生物は細菌である必要はなく、真菌又はウイルス病原体であることができる。本発明の医薬品及び医薬品組成物は慣用の態様で、例えば経口、局所、静脈内、腹腔内、筋肉内、皮下、鼻腔内又は皮内経路で投与することができる。医薬品組成物は、特定の適応症の治療及び／又は予防に有効である量で投与される。一般的に、医薬品組成物は少なくとも約10μg／体重kgの量で投与され、そして殆どの場合、１日当たり約８mg／体重kgを超えない量で投与される。殆どの場合、投与量は、投与経路、症状等を考慮に入れて、１日当たり約10μg／体重kgから約１mg／体重kgまでである。本発明の物質は天然形態で使用することができるか又は修飾して化学的誘導体を形成させることができる。本明細書で使用するき、分子が通常は分子の１部ではない追加的な化学的部分を有しているとき、この分子は別の分子の「化学的誘導体」であると言われる。このような部分は分子の溶解度、吸収、生物学的半減期等を改善することができる。これらの部分は代替的に、分子の毒性を減少させ、分子の望ましくない副作用を消失させるか又は弱化させる等の可能性がある。このような効果を媒介し得る部分はレミントンのファーマシューチカルサイエンス（Ｒemington's Ｐharmaceutical Ｓciences）（1980年）に開示されている。例えば、機能的誘導体の免疫学的特徴、例えば一定の抗体に対する親和性の変化は競合タイプのイムノアッセイによっ測定される。免疫調節活性の変化は適当なアッセイによって測定される。酸化還元若しくは熱安定性、生物学的半減期、疎水性、タンパク質分解を受け易いこと又は担体と凝集するか若しくは多量体に凝集する傾向のようなタンパク質特性の修正は通常の技倆を有する技術者に良く知られている方法でアッセイされる。本発明の医薬品の治療効果は任意の適当な手段（即ち、吸入、静脈内、筋肉内、皮下、腸又は非軽口）によって医薬品を患者に提供することによって得ることができる。生物の増殖を制御すべき血液又は組織内で有効な濃度を達成するように、本発明の医薬品を投与することが好ましい。有効な血液濃度を達成するために好ましい方法は注射によって医薬品を投与することである。投与は連続注入又は単回若しくは多数回注射によることができる。患者に本発明の医薬品の１つを提供する際には、投与する医薬品の投与量は、患者の年齢、体重、身長、性、一般的な医学的状態、以前の医学的履歴等のような要素に従って変動するであろう。一般的には、より少ないか又はより多い投与量を投与することができるが、約１pg／kgから10mg／kg（患者の体重）の範囲内の医薬品投与量を受容者に提供することが望ましい。治療的に有効な投与量は、本発明の医薬品又は他の医薬品の組合せ物を使用して少なくすることができる。本発明で使用するとき、（１）各化合物の生理学的効果か又は（２）各化合物の血清濃度のどちらかを同時に測定することができるとき、２つ若しくはそれより多い化合物又は医薬品は互いに「組み合わせて」投与されると言われる。本発明の組成物は、他の医薬品の投与と同時に、投与前に又は投与後に投与することができる。本発明の医薬品は標的生物の増殖速度（上記で特定したような）を低下させるのに十分な量で受容被験者に提供することを意図している。本発明の医薬品（単数又は複数）の投与は「予防的」か又は「治療的」のどちらかを目的とすることができる。予防的に提供するときには、医薬品（単数又は複数）は生物増殖を示す何らかの症状に先立って提供される。医薬品（単数又は複数）の予防的投与はその後の感染開始率の防止、弱化又は低下に役立つ。治療的に提供するときには、医薬品（単数又は複数）は感染の徴候の開始時（又は直後）に提供する。化合物（単数又は複数）の治療的投与は感染の病理学的症状の弱化及び回復速度の上昇に役立つ。本発明の医薬品は製薬的に許容できる形態でそして治療的に有効な温度で哺乳動物に投与される。組成物は、受容患者がその投与に耐えることができる場合、「製薬的に許容可能」であると言う。このような医薬品は、投与される量が生理学的に有意である場合、「治療的に有効な量」で投与されると言う。医薬品の存在によって受容患者の生理学における検出可能な変化が生じる場合、医薬品は生理学的に有意である。本発明の医薬品は製薬的に有用な組成物を調製する既知の方法に従って処方することができ、そしてその方法によってこれらの材料又はそれらの機能的誘導体は製薬的に許容可能な担体媒体と混合して組み合わせられる。他のヒトタンパク質、例えばヒト血清アルブミンを含めて、適当な媒体及びそれらの処方は例えば、レミントンのファーマシューチカルサイエンシズ（Ｒemington's Ｐh armaceutical Ｓciences）（第16版、Ｏsol，Ａ.、編集、Ｍack、ペンシルベニア州イーストン（1980年））に記載されている。有効な投与に適する製薬的に許容可能な組成物を形成するために、このような組成物は有効量の１つ又はそれより多い本発明の医薬品を適量の担体媒体と一緒に含有するであろう。追加的な製薬方法を使用して作用持続を制御することができる。放出制御製剤は本発明の１つ又はそれより多い医薬品をコンプレックス化するか又は吸収するためにポリマーを使用して達成することができる。送達の制御は適当な巨大分子（例えば、ポリエステル、ポリアミノ酸、ポリビニルピロリドン、エチレンビニルアセテート、メチルセルロース、カルボキシメチルセルロース又は硫酸プロタミン）及び巨大分子の温度並びに放出を制御するために導入する方法を選択することによって実施することができる。放出制御製剤による作用持続を制御するもう１つの考えられる方法は本発明の医薬品を、ポリエステル、ポリアミノ酸、ヒドロゲル、ポリ（乳酸）又はエチレンビニルアセテートコポリマーのようなポリマー材料の粒子中に導入することである。或いは、これらの医薬品をポリマー粒子中に導入する代わりに、これらの材料を、例えばコアセルペーション技術又は界面ポリマー化によって調製されたマイクロカプセル、例えば、それぞれ、ヒドロキシメチルセルロース又はゼラチンマイクロカプセル及びポリ（メチルメタクリレート）マイクロカプセル内に、又はコロイド医薬品送達系、例えばリボソーム、アルブミン微小球体、微小エマルジョン、ナノ粒子及びナノカプセル又はマクロエマルジョン中に閉じ込めことができる。このような技術はレミントンのファーマシューチカルサイエンシズ（1980年）中に記載されている。本発明は更に、本発明の製薬組成物の１つ又はそれより多い成分を充填した１つ又はそれより多い容器を含む製薬パック又はキットを提供する。このような容器（単数又は複数）に関係があるのは、医薬品又は生物学的製品の製造、使用又は販売を規制する政府当局によって規定された形態の通知であり、そしてこの通知はヒトに投与するために製造、使用又は販売の当局による承認を反映している。加えて、本発明の医薬品は他の治療用化合物と一緒にして使用することができる。６．メガペースＤＮＡ配列決定に対するショットガン方法本発明は更に、無作為ショットガン方法を使用して１メガベースより大きい配列の配列を決定できることを最初に示すものである。以下の実施例で詳細に記載されたこの方法によって、配列決定プロトコールを開始する前の重複するか又は連続したサブクローンを単離しそして順序づける前段階費用が削除された。以下の実施例では本発明の或る種の特徴を一層詳細に記載するが、これらによって限定されるものではない。実施例実験計画及び方法１．ショットガン配列決定戦略全ゲノムの配列決定に対するショットガン方法の戦略全体は表３に概略して示す。ショットガン配列決定の理論は、ポワソン分布の等式ｐ_x＝ｍ^xｅ^-m/ｘ！（式中、ｘは事象発生数であり、ｍは平均発生数であり、そしてｐ_xは、一定量の無作為配列が生じた後に与えられた塩基が配列を決定されない確率である）のランダー（Ｌander）及びウォーターマン（Ｗaterman）（Ｌander及びＷaterman、Ｇenomics 2:231（1988年））の適用に従っている。Ｌがゲノムの長さであり、ｎが配列決定されたクローン挿入末端の数であり、そしてｗが配列決定読み取りの長さである場合には、ｍ＝ｎｗ/Ｌであり、そしてクローンが一定の塩基に先行するｗ個の塩基のいずれでも生じない確率、即ち塩基が配列決定されない確率はｐ₀＝ｅ^-mである。ｍの単位として折りたたみ範囲を使用すると、1.8Ｍbの配列を無作為に発生させた後に、ｍ＝１であり、１Ｘ範囲を表わすことが分かる。この場合には、ｐ₀＝ｅ^-1＝0.37であるので、概ね37％が配列決定されていない。例えば５Ｘ範囲（挿入末端と平均配列読み取り長さがともに460bpのものから概ね9500個のクローンの配列が決定される）からｐ₀＝ｅ^-5＝0.0067又は0.67％の配列未決定が得られる。全体の］ギャップ長はＬe^-mであり、平均ギャップサイズはＬ／ｎである。５Ｘ範囲は128のギャップを残し、平均約100bpの大きさである。この処理はランダー及びウォーターマン（Ｇenomics 2:231（1988年））と本質的に同じである。表４は、460bpの平均フラグメントサイズを有する1.9Ｍbゲノムでの範囲を示す。２．無作為ライブラリー構築実際の配列決定中に上記した無作為モデルを概算するためには、クローン化ゲノムフラグメントのほぼ理想的なモデルが必要である。次のライブラリー構築方法はこれを達成するために開発した。Ｈ.インフルエンザＲd ＫＷ20 ＤＮＡはフェノール抽出によって調製した。60 0μgのＤＮＡ、300mＭ酢酸ナトリウム、10mＭトリス-ＨＣl、１mＭＮa-ＥＤＴＡ、30％グリセリンを含有する混合物（3.3ml）は、３mmプローブを使用して１分間０℃に設定した最低エネルギーで超音波処理した(Ｂranson Ｍodel 450 Ｓo nicator)。ＤＮＡをエタノール沈殿させ、そしてＴＥ緩衝液500μlに再度溶解した。ブラント末端を作るために、100μl分別物を、５単位のＢＡＬ31ヌクレアーゼ（Ｎew Ｅngland ＢioＬabs）を有するＢＡＬ31緩衝液200μl中30℃で10分間消化した。ＤＮＡをフェノール抽出し、エタノール沈殿し、ＴＥ緩衝液100μlに再度溶解し、1.0％低融解アガロースゲル上で電気泳動にかけ、そして1.6〜2.0k bの大きさのフラクションを切り取り、フェノール抽出し、そしてＴＥ緩衝液20 μlに再度溶解した。２段階結合方法を使用して、＞99％が１本鎖挿入物である9 7％の挿入物を有するプラスミドライブラリーが得られた。最初の結合混合物（5 0μl）はＤＮＡフラグメント２μg、ＳmaＩ/ＢＡＰ pＵＣ18 ＤＮＡ（Ｐharmaci a）２μg及びＴ４リガーゼ（ＧＩＢＣＯ/ＢＲＬ）10単位を含有しており、そしてインキュベーションは14’で４時間であった。フェノール抽出及びエタノール沈殿後、ＤＮＡはＴＥ緩衝液20μlに溶解し、そして1.0％の低融解アガロースゲル上で電気泳動にかけた。大きさによって挿入物（i）、ベクター（ｖ）、ｖ＋i、ｖ＋2i、ｖ＋3i、．．．として同定された、臭化エチジウム染色線形バンドの梯子状のものを360nmＵＶ光線で視覚化し、そしてｖ＋ｉＤＮＡを切り取り、そしてＴＥ 20μl中に回収した。ｖ＋ｉＤＮＡは、ｖ＋ｉ線形物、４個の各ｄＮＴＰ 500μＭ及びＴ4 ポリメラーゼ（Ｎew Ｅngland ＢioＬabs）９単位を含有する反応混合物（50μl）中推奨される緩衝条件下37℃で５分間Ｔ4 ポリメラーゼ処理してプラント末端とした。フェノール抽出及びエタノール沈殿後に、回復したｖ＋ｉ線形物をＴＥ 20μlに溶解した。円形物にするための最終の結合はｖ＋ｉ線形物５μl及びＴ4リガーゼ５単位を含有する反応物50μl中14'で一夜実施した。70'で10分後、反応混合物を−20'で貯蔵した。この２段階方法によって、二重挿入キメラ（＜１％）又は遊離ベクター（＜３％）から最小限の夾雑を有する一重挿入プラスミド組換え体の分子的に無作為収集物が得られた。無作為さからの逸脱がクローニング中に最も生起しそうに思われるので、組換え及び制限機能を全て欠いている大腸菌宿主細胞（Ａ．Ｇreener 、Ｓtrategies 3(1):5（1990年））を使用して、制限による再配列、欠失及びクローンの損失を防止した。形質転換細胞を抗生物質拡散プレートに直接まいて、最も迅速に増殖する細胞の増加及び選択を可能にする通常のブイヨン回復相を回避した。播種は次のようにして行った: エピキュリアンコリＳＵＲＥ IIスーパーコンピーテント細胞（Ｓtratagene 200152）の分別物100μlを氷上で解凍し、そして氷上の冷却ファルコン（Ｆalc on）2059管に移した。1.4Ｍのβ-メルカプトエタノールの分別物1.7μlを細胞分別物に加えて25mＭの最終濃度にした。細胞を氷上で10分間インキュベートした。最終結合物の分別物１μlを細胞に添加し、そして氷上で30分間インキュベートした。細胞を42'で30秒間パルス加熱し、そして氷上に２分間戻した。任意の一定の形質転換細胞の優先的増殖を最小限にするために、液体培養物中の生育期間はこのプロトコールから除いた。その代わりに、形質転換体を、ＳＯＢ寒天（1.5％ＳＯＢ寒天: トリプトン20ｇ、酵母抽出物５ｇ、Ｎa Ｃl 0.5ｇ、1.5％Ｄifco Ａgar／Ｌ）の底部層５mlを含有する栄養に富むＳＯＢプレート上に直接まいた。底部層５mlに0.4mlのアンビシリン（50mg／ml）／ＳＯＢ寒天100mlを補充する。ＳＯＢ寒天の上部層15mlにＸ-Ｇal（２％）１ml，Ｍ gＣl₂（１Ｍ）１ml及びＭgＳＯ₄／ＳＯＢ寒天100mlの１mlを補充する。上部層15 mlを播種する直前に注いだ。我々の力価は概ね100個のコロニー／形質転換分別物10μlであった。コロニーは全て大きさに関係なく鋳型製造用に採取した。ライブラリーからは「有毒」ＤＮＡ又は有害遺伝子産生物によって失われるクローンしか欠失せず、ギャップ数は予想されたより僅かしか増加しないであろう。Ｈ.インフルエンザライブラリーの品質を評価するために、Ｍ13-21プライマーを使用して概ね4000個の鋳型から配列データを取得した。1300、1800、2500、32 00及び3800個の配列フラグメントを取得した後に、無作為配列フラグメントを、オートアッセンブラー（登録商標）ソフトウエア（Ｐerkin-Ｅlmer（ＡＢ）のＡ pplied Ｂiosystems部門）を使用して組み立て、そして組み立てられた特有の塩基対の数を決定した。上記した式に基づいて、2.5Ｘ10⁶及び1.9Ｘ10⁶bpのゲノムについて460bpの平均読み取り長さで得られた配列フラグメント数の関数として、配列決定されていない塩基対の数の理想的なプロットを決定した（図３）。38 00個までの配列フラグメントのアッセンブリーから得られた実際のデータを使用してアッセンブリーの進展をプロットし、そして理想的なプロットで提供されているデータと比較する（図３）。図３は、実際のアッセンブリーデータが理想的なプロットから本質的には逸脱していなかったことを示しており、我々が二重挿入物キメラによる夾雑が最小限度でありそしてベクターを有していない理想的な無作為ライブラリーと近似して構築していたことを示している。３．無作為ＤＮＡ配列決定品質の高い二本鎖ＤＮＡプラスミド鋳型（19,687）は、アドバンストジェネチックテクノロジーコーポレーション（メリーランド州ガイザーズバーグ）と共同して開発した「沸騰ビーズ」法を使用して製造した(Ａdams等、Ｓcience 25 2 :1651(1991年); Ａdams等、Ｎature 355:632（1992年）)。プラスミドの調製は、細菌増殖から最終ＤＮＡ精製までの全てのＤＮＡ調製段階を96ウエル様式で実施した。鋳型濃度はヘキストダイ（Ｈoechst Ｄye）及びミリポアサイトフルオル（Ｍillipore Ｃytofluor）を使用して測定した。ＤＮＡ濃度は調整しなかったが、可能な場合、低産生鋳型を同定しそして配列は決定しなかった。鋳型は２つのＨ.インフルエンザラムダゲノムライブラリーからも調製した。増幅ライブラリーはベクターラムダＧＥＭ-12（Ｐromega）中で構築し、そして非増幅ライブラリーはラムダＤＡＳＨ II（Ｓtratagene）中で構築した。特に非増幅ラムダライブラリーについては、Ｈ.インフルエンザＲd ＫＷ20 ＤＮＡ（＞100kb）を、ＤＮＡ50μg、１ＸＳau3ＡI緩衝液、Ｓau3ＡI 20単位を含有する反応混合物（200μl）中23'で６分間部分的に消化した。消化したＤＮＡはフェノール抽出し、そして0.5％の低融解寒天ゲル上で２Ｖ／cmで７時間電気泳動にかけた。15 から25kbのまでのフラグメントを切り取り、そして６μlの最終容量で回収した。フラグメント１μlは、推奨される結合反応でＤＡＳＨIIベクター（Ｓtratage ne）１μlと一緒に使用した。ギガパック（Ｇigapack）II ＸＬパッケージングエキストラクト（Ｓtratagene、#227711）による推奨プロトコール後のパッケージング反応当たり結合混合物１μlを使用した。ファージは、パッケージング混合物から増幅しないで直接まいた(推奨されるＳＭ緩衝液500μlによる希釈及びクロロホルム処理後)。収量は約2.5×10³ pfu／μlであった。増幅ライブラリーは、ラムダＧＥＭ-12ベクターを使用したことを除いて、上記と本質的に同じようにして作成した。パッケージング後、約3.5×10⁴ pfuを制限ＮＭ539宿主にまいた。溶解物をＳＭ緩衝液２ml中に採取し、そして７％ジメチルスルホキシド中で凍結して保存した。このファージ力価は概ね１×10⁹pfu／mlであった。液体溶解物（10ml）は、無作為に選択したプラークから調製し、そして鋳型は陰イオン交換樹脂（Ｑiagen）上で調製した。配列決定反応は、Ｍ13前向き（Ｍ1 3-21）及びＭ13逆向き（Ｍ13ＲＰ1）プライマー（Ａdams等、Ｎature 368:474（ 1994年））用のアプライドバイオシスチムズＰＲＩＳＭレディリアクションダイプライマーサイクルシークェンシングキット（Ｒeady Ｒeaction Ｄye Ｐrimer Ｃycle Ｓequencing Ｋits）を用いてＡＢ触媒ラブステーション（Ｌab Ｓtation）を使用してプラスミド鋳型上で実施した。染料ダーミネーター配列決定反応は、アプライドバイオシステムズレディリアクションダイターミネーターサイクルシークェンシングキットを使用してパーキン-エルマー 9600 サーモサイクラーでラムダ鋳型で実施した。Ｔ7及びＳＰ6プライマーを使用してラムダＧＥＭ-12ライブラリーから得られる挿入物末端の配列を決定し、そしてＴ7及びＴ3プライマーを使用してラムダＤＡＳＨ IIライブラリーから得られる挿入物の末端の配列を決定した。ＡＢ373 ＤＮＡシークエンサーを１日当たり平均14個使用して３ヵ月間配列決定反応（28,643）を８人で実施した。配列決定反応は全て、ＡＢ 373のストレッチ修飾を使用し、そして主として34cmのウエル対読み取り距離を使用して分析した。全体の配列決定成功率はＭ13-21配列に対しては84％、Ｍ13ＲＰ1配列に対しては83％、そして染料ターミネーター反応に対しては65％であった。平均使用可能読み取り長さはＭ13-21配列については485bp であり、Ｍ13ＲＰ1配列については444bpであり、そして染料ターミネーター反応については375bpであった。表５は本発明の高い処理量の配列決定相を要約するものである。リチャード（Ｒichards）等（Ｒichards等、Ａutomated ＤＮＡ sequencing a nd Ａnalysis、Ｍ.Ｄ．Ａdams、Ｃ．Ｆields、Ｊ.Ｃ．Ｖenter、編集(Ａcademic Ｐress、ロンドン、1994年)、28章）は、ラムダ及びコスミドクローンのショットガンアッセンブリープロジェクトで連続物整列を促進する配列決定鋳型の両末端から得られる配列を使用すること価値を記載している。我々は、Ｍ13-21（前向き）プライマーと比較してＭ13ＲＰ１（逆向き）で実施した配列決定反応でのより短い読み取り長さに対する両末端配列決定の望ましさ（鋳型総数の減少による費用減少を含めて）を比較衡量した。概ね鋳型の半分が両末端から配列決定された。合計で、9,297のＭ13ＲＰ1配列決定反応を実施した。無作為逆向き配列決定反応は成功した前向き配列決定反応に基づいて実施された。幾つかのＭ13ＲＰ 1配列は半方向づけ態様で得られた: 連続物の末端で外方向を指しているＭ13-21 配列は特に連続物を整列させる努力でＭ13ＲＰ1配列決定用に選択した。半方向づけ戦略は有効であり、そしてクローンに基づく整列はアッセンブリーとギャップ閉じの統合部分を形成した（以下参照）。４．自動サイクル配列決定用プロトコール配列決定は８つのＡＢＩカタリスト（Ｃatalyst）ロボット及び14のＡＢ 373 自動ＤＮＡシークエンサーを使用することからなっていた。カタリストロボットは、ＤＮＡの配列決定反応用に特別に開発された公に入手できる精巧なピペット化及び温度制御ロボットである。このカタリストは予め分別した鋳型と、デオキシヌクレオチド及びジデオキシヌクレオチド、Ｔaq熱安定性ＤＮＡポリメラーゼ、蛍光標識配列決定プライマー及び反応緩衝液からなる反応混合物を組み合わせている。反応混合物と鋳型はアルミニウムの96ウエル熱サイクルプレートのウエル内で一緒に組み合わせた。変性、プライマーと鋳型のアニーリング及びＤＮＡ合成の伸長を含む直線的増幅（例えば、１プライマー合成）の30回連続サイクル段階を実施した。熱循環プレート上のゴムガスケット付き加熱蓋は蒸発を防止し、油を上塗りする必要はなかった。２つの配列決定プロトコールを使用した: 染料標識プライマーと染料標識ジデオキシチェインターミネーター。ショットガン配列決定法は４つのターミネーターヌクレオチドの各々について１つずつの、４つの染料標識配列決定プライマーの使用に係わるものである。各染料-プライマーは異なる蛍光染料で標識されており、４つの個々の反応を電気泳動、検出及び塩基呼出しのために373ＤＮＡシークエンサーの１つのレーン中に組み合わせることができる。ＡＢは現在、予め混合した反応混合物を、配列決定に必要な全ての非鋳型試薬を含有するバルクパッケージで供給している。プラスミド鋳型によって一般的にはより長い使用可能な配列が得られるが、配列決定はプラスミドと、厳密性が概ね同等の染料-プライマーと染料-ターミネーターの両方を有するＰＣＲ発生鋳型の両方を用いて実施することができる。合計960個の試料について、373 シークエンサー当たり32の反応を毎日行った。電気泳動は製造者のプロトコールに従って夜間に実施し、そしてデータを12時間集めた。電気泳動及び蛍光検出後に、ＡＢ 373は自動レーントラッキング及び塩基呼出しを実施する。レーントラッキングは視覚的に確認された。各配列のエレクトロフェログラム（又は蛍光レーントレース）は視覚的に点検しそして品質について評価した。低品質配列を追跡して取り出し、そしてその配列自体をソフトウエアによってＳybaseデータベースに入れた(８mmテープに毎日保管した)。リーディングベクターポリリンカー配列はソフトウエアプログラムで自動的に取り除いた。標準ＡＢＩ373から編集した平均配列長さは400bp位であり、そしてこれは大部分、配列決定反応用に使用した鋳型の品質に依存していた。ＡＢＩ373 シークエンサーは全てストレッチライナーズ（Ｓtretch Ｌiners）に変換し、そしてこれによって蛍光検出前により長い電気泳動通路が提供され、その結果使用可能な塩基の平均数は500〜600bpに増加した。情報科学１．データ管理大規模配列決定研究室（lab）用の多数の情報管理システムが開発されている( Ｋerlavage等、Ｐroceedings of the Ｔwenty-Ｓixth Ａnnual Ｈawaii Ｉntern ational Ｃonference on Ｓystem Ｓciences、ＩＥＥＥＣomputer Ｓociety Ｐress、ワシントンＤ.Ｃ.、585（1993年）)。配列データを集めそして組み立てるために使用されるこのシステムはＳybase関連データ管理システムを使用して開発され、そしてできるだけデータの流れを自動化して使用者のエラーを減少させるように設計された。データベースは、鋳型調製からゲノムの最終分析までの全操作中に集められた全ての情報を保存しそして相互に関連させている。ＡＢ37 3シークエンサーの生アウトプットはマッキントッシュプラットフォームに基づいており、そして選択されたデータ管理システムはユニックスプラットフォームに基づいていたので、生データ並びに分析結果を最小限の使用者の努力でデータベース中に継ぎ目なく流すようにできる多様な多数の使用者、依頼者のサーバーアプリケーションを設計しそして実行することが必要であった。大きい配列の組み立てや管理に使用するソフトウエアプログラムの説明は図４に提供する。２．組み立て組み立てエンジン（ＴＩＧＲＡssenbler）は数千の配列フラグメントを迅速且つ正確に組み立てるために開発された。ＡＢオートアッセンブラー（商標）を、ＴＩＧＲアセンブラーの整列配列ファイルアウトプットと関連したデータ編集の目的でエレクトロフェログラムにグラフィックインターフェースを提供するように修正した（そしてＴＩＧＲＥditorと呼ばれる）。ＴＩＧＲエディター（Ｅditor）はマッキントッシュプラットフォームのエレクトロフェログラムファイルとユニックスプラットフォームのインフルエンザ菌データベース中の配列データ間の同時性を維持している。ＴＩＧＲアセンブラーはゲノムのフラグメントを同時に集めそして組み立てる。10⁴個より多いフラグメントを組み立てるのに必要な速度を得るために、アルゴリズムによって10bpのオリゴヌクレオチド配列の寄せ集め表を作って配列フラグメント重複表を作成した。各フラグメントの可能性のある重複数は、どのフラグメントが反復要素中に入ると思われるのかを決定する。単一の種配列フラグメントで開始して、ＴＩＧＲアセンブラーはオリゴヌクレオチド含有量に基づいて最良の適合フラグメントを加えるように試みることによって最新連続物を伸長する。最新連続物及び候補フラグメントは、最適のギャップ整列を提供するスミス・ウォーターマン（Ｓmith-Ｗaterman）アルゴリズム（Ｗaterman，Ｍ.Ｓ.、Ｍe thods in Ｅnzymology 164:765（1988年））の修正版を使用して整列される。最新連続物は適合の品質に関する厳密な規準が満たされている場合にだけフラグメントによって伸長される。適合規準には最小限の重複の長さ、最大限の非適合末端の長さ及び最小限の適合割合が含まれる。これらの規準は、最小限範囲の領域ではアルゴリズムによって自動的に低下し、そして考えられる反復要素を有する領域内では上昇する。各フラグメントの可能性のある重複数は、どのフラグメントが反復要素中に入ると思われるのかを決定する。反復要素と潜在的にキメラであるフラグメントの境界を表わすフラグメントはしばしば、整列末端の部分的不適合に基づいて拒絶され、そして最新速続物から除かれる。ＴＩＧＲアセンブラーは各鋳型の両末端からの配列決定と一緒にクローンの大きさの情報を利用するように設計されている。これによって、同一の鋳型の２つの末端から得られる配列フラグメントが連続物内で互いに向き合っておりそして一定の塩基対整列（与えられたライブラリーに関して、既知のクローンの大きさ範囲に基づいて各クローンで特定可能な）内に位置しているという制約が強制される。インフルエンザ菌の24,304個の配列フラグメントの組み立てには、512ＭbのＲＡＭを有するＳＰＡＲＣenter 2000の１つのプロセッサーを使用して、30時間のＣＰＵ時間が必要であった。この方法で概ね210個の連続物が得られた。ＴＩＧＲアセンブラーの厳密度が高いため、grasta（修正されたfasta（Ｐerson及びＬipman、Ｐr oc．Ｎatl．Ａcad．Ｓci．Ｕ.Ｓ.Ａ．85:2444（1988年））を使用して全ての連続物を互いに検索した。このようにして、140個の連続物中にセットされたデータの要約を可能にする重複を更に検出した。連続物内の各フラグメントの位置及びコンセンサス配列自体に関する広範な情報をインフルエンザ菌関連データベースに入れた。３．組み立てられた連続物の整列組み立て後、140個の連続物の相対的な位置は知られていなかった。これらの連続物はasm alignによって整列させた。asm alignは互いに隣接する連続物を同定しそして整列させるために多数の関係を使用する。このアルゴリズムを使用して、140個の連続物を42の群に入れ、合計して42の物理的ギャップ（この領域の鋳型ＤＮＡがない）と98の配列ギャップ（ギャップを埋めるために鋳型が利用できる）となった。物理的ギャップによって分離された連続物を整列させそしてギャップを埋める物理的ギャップで分離された連続物を整列させるために４つの統合戦略を開発した。オリゴヌクレオチドプライマーを設計しそして各連続物群の末端から合成した。それ故、これらのプライマーは以下に概略する１つ又はそれより多い戦略で使用するためにを使用することができた：１．上記オリゴヌクレオチドの72のサブセットについて固有の「フィンガープリント」を発生させるためにサザン分析を行った。この方法は、隣接連続物の末端と相同の標識オリゴヌクレオチドが共通のＤＮＡ制限フラグメントとハイブリッドを形成し、そしてその結果類似するか若しくは同一のハイブリッド形成パターン又は「フィンガープリント」を共有するという仮定に基づいていた。オリゴヌクレオチドは50pmolの各20量体及び250mＣiの［γ-³²Ｐ］ＡＴＰ並びにＴ4 ポリヌクレオチドキナーゼを使用して標識した。標識したオリゴヌクレオチドは、セファデックスＧ-25スーパーファイン（Ｐharmacia）を使用して精製し、そして１つの頻繁に使用されるカッター（ＡseＩ）と５つの頻繁には使用されないカッター（ＢglII、ＥcoＲI、ＰstI、ＸbaI及びＰvuII）で消化したインフルエンザ菌Ｒd染色体ＤＮＡのサザンハイブリッド分析で各々107cpmのオリゴヌクレオチドを使用した。各消化物から得られるＤＮＡを0.7％アガロースゲルで分画し、そしてニトランプラス（Ｎytran Ｐlus）ナイロン膜（Ｓchleicher & Ｓchu ell）に移した。ハイブリッド形成は40'で16時間実施した。非特異的シグナルを除去するために、各プロットを室温で、緊縮度条件を0.1ＸＳＳＣ＋0.5％ＳＤＳにまで高め乍ら連続洗浄した。プロットをホスホルイメージャー（ＰhosphorＩm ager）カセット（Ｍolecular Ｄynamics）に数時間暴露し、そしてハイブリッド形成パターンを視覚的に比較した。このようにして同定された隣接連続物を特異的ＰＣＲ反応の標的とした。２．ペプチド結合はペプチドデータベースに対してblastx（Ａltschul等、Ｊ．Ｍol．Ｂiol．215:403（1990年））を使用して各連続物端の検索を行った。２つの連続物の末端が適当な方法で同じデータベース配列に適合していた場合には、これら２つの連続物は違いに隣接していると暫定的に考えた。３．インフルエンザ菌ゲノムＤＮＡから構築された２つのラムダライブラリーは、連続物群の末端から設計したオリゴヌクレオチドで探索した(Ｋirkness等、Ｇenomics 10:985（1991年）)。次に、陽性プラークを使用して鋳型を作成し、そしてラムダクローン挿入物の各末端から配列を決定した。これらの配列フラグメントは全ての連続物のデータベースに対してgrastaを使用して検索した。同じラムダクローンの反対末端から得られる配列に適合する２つの連続物を整列させた。次いで、このラムダクローンによって隣接速続物間の配列ギャップを埋めるための鋳型が提供された。ラムダクローンは反復構造を解読するのに特に価値がある。４．他の方法によって見い出された連続物の整列を確認しそして整列していない連続物の順番を確立するために、標準的で且つ範囲の長い（ＸＬ）ＰＣＲ反応を次のようにして実施した。標準的なＰＣＲは次のようにして実施した。各反応は混合物37μl; Ｈ₂Ｏ 16. 5μl、25mＭＭgＣl₂ ３μl、ｄＮＴＰミックス（1.25mＭの各ｄＮＴＰ）８μl 、I0ＸＰＣＲコア緩衝液II（Ｐerkin Ｅlmer）4.5μl、インフルエンザ菌Ｒd ＫＷ20ゲノムＤＮＡ 25ngを含んでいた。適当な２つのプライマー（４μl、3.2p mole／μl）を各反応に加えた。ホットスタートは95 で５分間、続いて75'で維持して実施した。維持している間に、Ｈ₂Ｏ 4.3μl、10ＸＰＣＲコア緩衝液II 0.5μl中のアンプリタク（Ａmplitaq）ＤＮＡポリメラーゼ（Ｐerkin Ｅlmer）0 .3μlを各反応に加えた。ＰＣＲプロフィールは、94'／45秒の変性; 55'／１分のアニーリング; 72'／３分の伸長の25回サイクルであった。反応は全て、パーキンエルマーＧeneＡmp ＰＣＲシステム9600で96ウエル様式で実施した。長い範囲のＰＣＲ（ＸＬＰＣＲ）は次のようにして実施した: 各反応は混合物35.2μl; Ｈ₂Ｏ 12.0μl、25mＭＭg(ＯＡc)₂ 2.2μl、ｄＮＴＰミックス（20 0μＭの最終濃度）４μl、3.3ＸＰＣＲ緩衝液 12.0μl、インフルエンザ菌Ｒd ＫＷ20ゲノムＤＮＡ 25ngを含んでいた。適当な２つのプライマー（５μl、3.2pmole／μl）を各反応に加えた。ホットスタートは94'で１分間で実施した。各反応に、3.3ＸＰＣＲ緩衝液II 2. 8μl中2.0μlのrＴthポリメラーゼ（４Ｕ／反応）を加えた。ＰＣＲプロフィールは、94'／15秒の変性; 62'／８分のアニーリング及び伸長の18回のサイクル、続いて94'／15秒の変性; 62'／８分（１サイクル当たり15秒増加）のアニーリング及び伸長の12回のサイクル; 72'／10分の最終伸長であった。反応は全て、パーキンエルマーＧeneＡmp ＰＣＲシステム9600で96ウエル様式で実施した。ＰＣＲ反応は本質的に物理的ギャップ末端の組合せ毎に実施したが、互いに隣接する連続物を整列させそしてギャップを完全に埋めるために必要なコンビナトリーＰＣＲ反応数を減少させるためには、サザンフィンガープリント法、データベース適合化及び大きい挿入物クローン探索のような技術が特に価値があった。これらの戦略をより大規模に将来のゲノムプロジェクトに使用しても完全にゲノムを埋める全体的な有効性が高められるであろう。これらの各技術によって整列させられそして埋められる物理的ギャップの数は表５に要約する。 15〜20kbのクローン末端から得られる配列情報はギャップを埋め、反復構造を解明し、そして全体的なゲノム組立ての一般的な情報を提供するのに特に適している。我々はまた、インフルエンザ菌ゲノムのフラグメントのなかには大腸菌の高コピープラスミド中でクローン化できないものがあることにも関心があった。我々は、溶解性ラムダクローンがこれら断片のＤＮＡを提供すると考えた。増幅したラムダライブラリー、作成された鋳型及び各末端から得られた配列情報から概ね100個の無作為プラークを集めた。これらの配列を連続物に対して検索し(gr asta)、そしてデータベースでそれらの適当な連続物と結合し、そしてその結果ゲノムアセンブリーの正確さを更に支持するのに寄与するラムダクローンの足場が提供された(図５)。連続構造の確認に加えて、ラムダクローンは23の物理的ギャップを埋めた。ゲノムの概ね78％がラムダクローンによってカバーされた。ラムダクローンは反復構造の解明にも有用であった。ゲノム中で同定された反復構造は、６個のリボソームＲＮＡオペロン及び長さ5,340bpの１個の反復（２コピー）を除いて、無作為挿入物ライブラリーから得られる単一クローンで埋めることができるほど十分に小さかった。オリゴヌクレオチドプローブは各反復の初めの特有の側部から設計しそしてラムダライブラリーとハイッブリッドを形成させた。各側部について陽性プラークを同定しそして各クローンの末端から得られる配列フラグメントを使用してゲノム内の反復を正しく方向づけた。インフルエンザ菌の６個のリボソームＲＮＡ（rＲＮＡ）オペロン（16Ｓサブユニット−23Ｓサブユニット−5Ｓサブユニット）を識別できそして組み立てることができるかどうかが、かなりの数の反復領域を有していると思われる複雑なゲノムの配列を決定しそして組み立てるための我々の全体的な戦略の試験であった。高度の配列類似性及び６個のオペロンの長さによって、基礎になっている全ての配列を識別不能な２,３の連続物に集める組み立て方法が得られた。配列内でのオペロンの正しい配置を決定するために、それぞれについて１対の特有の側部配列が必要であった。左（16Ｓ rＲＮＡ）末端には特有の側部配列は全く見ることができなかった。この領域はリボソームプロモーターを有しており、そして高コピー数ｐＵＣ18プラスミド中でクローン化できないように思われた。しかし乍ら、特有の配列は右（5Ｓ）末端で確認することができた。オリゴヌクレオチドプライマーはこれらの６個の側部領域から設計しそして2つのラムダライブラリーを提案するために使用した。６個の各rＲＮＡオペロンについて、rＲＮＡオペロンを完全に埋めそして16Ｓ及び5Ｓ末端に特有の側部配列を有する少なくとも１個の陽性プラークが確認された。これらのプラークは、特有の配列を得る鋳型を６個のrＲＮＡオペロンの各々に提供した。組み立てた円形ゲノムの全体的構造の更なる配置は、酵素ＡpaI、ＳmaI及びＲ srIIについて組み立てた配列に基づくコンピューター発生制限マップをレッドフィールド(Ｒedfield)及びリー（Ｌee）（Ｇenetic Ｍaps: locus maps of compl ex genomes、Ｓ.Ｊ．Ｏ'Ｂrien編集、Ｃold Ｓpring Ｈarbor Ｌaboratory Ｐre ss、ニューヨーク州ニューヨーク、1990年、2110）の予測物理的マップと比較することによって得られた。配列から誘導されたマップの制限フラグメントは物理的マップから得られたものと大きさ及び相対的順序が一致していた（図５）。編集ＡＢオートアッセンブラー（登録商標）及びファーストデータファインダー（Ｆast Ｄata Ｆinder^fM）ハードウエアを使用して連続物の重複10kb断片を組み立てることによって視覚的に編集した。オートアッセンブラー（登録商標）は編集用にエレクトロフェログラムデータにグラフィックインターフェースを提供する。エレクトロフェログラムデータを使用して各位置に最も可能性のある塩基を割り当てた。相違点を解決できなかったか又は明確な割り当てを行えなかった場合、自動塩基呼出しは変更しなかった。個々の配列の変更はエレクトロフェログラムファイルに書き込みそして複製プロトコール（crash）を使用してインフルエンザ菌データベースとエレクトロフェログラムファイル間の配列データの同時性（同一性？）を維持した。編集後、注釈を付ける前にＴＩＧＲアセンブラーヲ用いて連続物を再度組み立てた。ゲノムに注釈を付ける途中に確認された潜在的フレームシフトはデータベースのレポートとして省いた。これらのレポートには、整列ソフトウエア（praze）が逸失又は挿入塩基の最も可能性のある市であると予測する連続物の組合せ及びフレームシフトを含む配列整列の表示が含まれる。明らかなフレームシフトは、更に編集する必要があると思われる配列の領域を示すために使用した。フレームシフトは、明白なエレクトロフェログラムデータがフレームシフトと一致しない場合にも訂正しなかった。フレームシフトの編集はＴＩＧＲエディター（Ｅditor）を用いて実施した。ｒＲＮＡ及び他の反復領域はＴＩＧＲアセンブラーによる円形ゲノムの完全な組み立てを妨げた。ゲノムの最終的な組み立ては、短い重複に基づいて複数の連続物を一緒に継ぎ合わせるcomb asmを使用して達成された。ゲノム配列の正確さインフルエンザ菌ゲノム配列の正確さは、これまでに同定されたインフルエンザ菌配列が非常に少なくそしてこれら配列の大部分が他の種のものであるため、定量することが困難である。しかし乍ら、データに適用することができる３つの正確さパラメーターが存在している。第１は、データベースの類似性に基づいて、予測されるインフルエンザ菌遺伝子の明白なフレームシフトの数は148である。これらの明白なフレームシフトの幾つかはは、特に、49の明白なフレームシフトが他の生物から得られる仮定のタンパク質に対する適合に基づいているということを考慮すると、我々の配列中にあるというよりはむしろデータベース配列中にあると思われる。第２に、ゲノム中にはＮとして曖昧なままの塩基が188個がある(１／9,735bp)。これら２つのタイプの「知られている」エラーを組み合わせると、我々は99.98％の最大配列正確度を計算することができる。平均範囲は6 .5Ｘであり、そしてゲノムの１％は１倍の範囲である。遺伝子の同定インフルエンザ菌Ｒdゲノムの全てのコード化領域を予測しそして遺伝子、ｔＲＮＡ及びｒＲＮＡ並びにＤＮＡ配列の他の特徴（例えば、反復、調節部位、複製、起源部位、ヌクレオチド組成）を同定する試みを行った。容易に明白は配列特徴の幾つかの説明以下に提供する。インフルエンザ菌Ｒdゲノムは1,830,121bpの円形染色体である。全体的なＧ／Ｃヌクレオチドの含有量は概ね38％（Ａ＝31％、Ｃ＝19％、Ｇ＝19％、Ｔ＝31％、ＩＵＢ＝0.035％）である。ゲノムのＧ／Ｃ含有量は全体的な構造特徴を探すために幾つかの長さのウインドウで調べた。5,000bpのウインドウでは、Ｇ／Ｃの含有量は、７ラージＧ／Ｃに富む領域及びＡ／Ｔに富む領域を除いて比較的均等である（図５）。Ｇ／Ｃに富む領域は６個のｒＲＮＡオペロン及び不可解なミュー様プロファージの位置に対応する。バクテリオファージミューによってコードされるタンパク質と類似性を有する幾つかのタンパク質の遺伝子はゲノムの概ね1.56〜1.59Mbpの位置に位置している。ゲノムのこの領域はインフルエンザ菌の平均より顕著に高いＧ／Ｃ含有量を有している(ゲノムの残りの領域の約38％と比較して約50％のＧ／Ｃ)。Ａ／Ｔに富む領域の原因又は重要性については有意性は未だ確認されていない。大腸菌の最小複製起源（oriＣ）は、１つの末端のＧＡＴＣコア配列を含有する13塩基対の反復の３個のコピー及び他の末端のＴＩＡＴコア配列を含有する９塩基対の反復の４個のコピーによって特定される245bp領域である。ＧＡＴＣ部位はメチル化標的及び制御複製であり、一方ＴＩＡＴ部位はＤnaＡの結合部位を提供し、複製プロセスの最初の段階である(Ｇenes Ｖ,Ｂ．Ｌewin編集(Ｏxford Ｕniversity Ｐress、ニューヨーク、1994年)、18〜19章)。限界がこれらの同じコア配列によって特定されている概ね281bpの配列（602,483〜602,764）はインフルエンザ菌Ｒdで複製起源を特定しているように思われる。これらのコーディネートはリボソームオペロンの組合せrrnＦ、rrnＥ、rrnＤとrrnＡ、rrnＢ、rrn Ｃ間に存在する。リボソームオペロンのこれら２つの群は反対方向に転写されそして起源の置換がそれらの転写の極性と一致する。大腸菌複製の終結は、２つの複製フォークが出会う中間点の一方の側の約100bpに配置された２つの23bp終止配列によって示される。大腸菌の終止配列と10bpコア配列を共有する２つの潜在的終止配列はインフルエンザ菌ではコーディネート1,375,949〜1,375,958及び1,558,759〜1,558,768 で同定された。これら２つの組のコーディネートは、提案されたインフルエンザ菌の複製起源の180'反対の点から概ね100kbをカバーする。６個のｒＲＮＡオペロンが同定された。各ｒＲＮＡオペロンは３つのｒＲＮＡサブユニット及び種々のスペーサー領域を次の順序で含有している: 16Ｓサブユニット − スペーサー領域 − 23Ｓサブユニット − 5Ｓサブユニット。これらのサブユニットの長さはそれぞれ1539bp、2653bp及び116bpである。これら３つのリボソームサブユニットのＧ／Ｃ含有量（50％）はゲノムより全体として多い。スペーサー領域のＧ／Ｃ含有量（38％）はゲノムの残りと一致する。３つのｒＲＮＡサブユニットのヌクレオチド配列は６つのリボソームオペロンの全てで10 0％同一である。ｒＲＮＡオペロンは、16Ｓ配列と23Ｓ配列間のスペーサー領域に基づいて２つのクラスに分類することができる。２つのスペーサー領域のうちより短いものは長さが478bpであり（rrnＢ、rrnＥ及びrrnＦ）そしてｔＲＮＡＧluの遺伝子を含んでいる。より長いスペーサーは長さが723bpであり（rrnＡ、 rrnＣ及びrrnＤ）そしてｔＲＮＡ Ile及びｔＲＮＡＡlaの遺伝子を含んでいる。これら２つのスペーサー領域の組も３つのオペロンの各群で100％同一である。ｔＲＮＡ遺伝子はまた、２つのｔＲＮＡオペロンの16Ｓ及び５Ｓ末端にも存在している。ｔＲＮＡＡrg、ｔＲＮＡＨis及びｔＲＮＡＰroの遺伝子はrrnＥの 16Ｓ末端に位置しており、一方ｔＲＮＡＴrp及びｔＲＮＡＡSPの遺伝子はrrn Ａの５Ｓ末端に位置している。インフルエンザ菌のゲノムの予測されたコード化領域は最初、ジーンバンクの 122個のインフルエンザ菌コード化配列から誘導されるコドン頻度マトリックスを使用してプログラムジーンマーク（Ｂorodovsky及びＭcIninch、Ｃomputers Ｃhem．17(2):123（1993年））でコード化能を評価して特定した。予測されたコード領域配列（プラス300bpの側部配列）は、注釈を付けるために特に作られた非冗長細菌タンパク質（ＮＲＢＰ）のデータベースの検索で使用された。ＤＮＡコード化領域は全てジーンバンクから引き出し（リリース85）、そして同一種から得られた配列を互いに検索した。＞100個のヌクレオチドの領域に亘って＞97 ％の類似性を有する配列を組み合わせた。更に、配列を翻訳させ、そしてスイスプロット（Ｓwiss-Ｐrot）の全ての配列（リリース30）とのタンパク質比較で使用した。同じ種に属しそして33個のアミノ酸に亘って＞98％の類似性を有する配列を組み合わせた。ＮＲＢＰは1,099の異なる種から得られた23,751個のジーンバンク配列と11,183個のスイスプロット配列から引き出された21,445個の配列から構成されている。合計1,749の構成予測コード化領域が同定された。インフルエンザ菌の予測コード化領域の検索は、ＮＲＢＰを検索するために３つの付加鎖（ストランド）読み取り枠内の疑問ＤＮＡ配列を翻訳し、疑問配列に適合するタンパク質配列を同定し、そしてprazeを使用してタンパク質−タンパク質の適合を整列させるアルゴリズム、即ち修正スミス・ウォーターマン（Ｓmith-Ｗateroman）（Ｐearson 及びＬipman、Ｐroc．Ｎatl．Ａcad．Ｓci．Ｕ.Ｓ.Ａ．85:2444（1988））アルゴリズムを使用して実施した。ＤＮＡ配列内の挿入又は欠失によってフレームシフトエラーが生じた場合には、整列アルゴリズムは最大類似性の領域で開始し、そして300bpの側部領域を使用して別のフレームで同じデータベース適合に整列を伸長した。フレームシフトエラーを含んでいることが知られている領域はデータベースで省きそして修正の可能性について評価した。同定されなかった予測コード化領域及び残りの遺伝子間配列はスイス・プロット、ＰＩＲ及びジーンバンクから入手できる全てのペプチド配列のデータセットを検索した。オペロン構造の同定は転写プロモーター及び終結部位の実験的測定によって促進されるであろう。推定して同定した各インフルエンザ菌遺伝子は、リレイ（Ｒiley）（Ｒiley，Ｍ.、Ｍicrobiology Ｒeviews 57(4):862（1993年））から適合させた102の生物学的役割カテゴリーの１つに割り当てた。割り当ては、予測コード化領域のタンパク質配列をリレイデータベース中のスイス・プロット配列と関連させて実施した。1,749の予測コード化領域のうち724は役割を割り当てられていない。これらのうち、384についてはデータベース適合が見い出されず、一方340はデータベースの「仮定上のタンパク質」に適合した。役割割り当ては予測コード領域の1,02 5で行われた。予測コード化領域の編集、それらの特有の識別物、３文字遺伝子識別記号、同一性パーセント、類似性パーセント及びアミノ酸適合長さは表１（ａ）に示す。インフルエンザ菌Ｒdの注釈を付けた完全なゲノムマップは図６（Ａ）〜（Ｄ）に示す。このマップは各予測コード化領域をインフルエンザ菌染色体上に置き、その転写方向を示し、そして色はその役割割り当てを暗号化している。役割割り当ては図５にも示す。インフルエンザ菌Ｒdの遺伝子及び遺伝子染色体機構を調査すると代謝過程の説明が可能になる。インフルエンザ菌は自由な生存生物として生き残るために、実験室で増殖するための栄養要求及び、特にその病原性やビルレンス（菌力）に関連するような他の生物と異なるようになる特徴を必要とする。ゲノムは生命にとって必須であることが知られている或るクラスの遺伝子の完全な補体を有していると期待されるであろう。例えば、インフルエンザ菌データベースの潜在的相同体と大腸菌リボソームの発表されたタンパク質配列は一対一で対応している。同様に、表１（ａ）に示されるように、アミノアシルｔＲＮＡシンテターゼは各アミノ酸についてゲノム内に存在している。最後に、ｔＲＮＡ遺伝子の位置をゲノム上にマップした。代表的な 20種のアミノ酸を含めて54の同定されたｔＲＮＡ遺伝子がある。自由な生存生物として生き残るためにインフルエンザ菌は発酵及び／又は電子輸送によってエネルギーをＡＴＰの形態で生じさせなければならない。偶発的嫌気性生物として、インフルエンザ菌Ｒdはグルコース、フルクトース、ガラクトース、リボース、キシロース及びフコースを発酵することが知られている(Ｄoro cicz等、Ｊ．Ｂacteriol．175:7142（1993年）)。表１(ａ)で確認される遺伝子は、ホスホエノールピルビン酸−ホスホトランスフェラーゼ系（ＰＴＳ）及び非ＰＴＳメカニズムによるこれら糖類の取込みに輸送系を利用できることを示している。ＰＴＳ系の共通のホスフェート担体酵素Ｉ及びＨpr（ptsＩ及びptsＨ）を特定する遺伝子並びにグルコース特異性crr遺伝子が同定された。ptsＩ、ptsＨ及びcrr遺伝子はptsオペロンを構成する。しかし乍ら、我々は膜結合グルコース特異的酵素IIを同定していない。後者の酵素はＰＴＳ系でのグルコース輸送に必要である。フルクトースの完全なＰＴＳ系が同定された。完全な解糖系及び発酵最終生成物の産生をコードする遺伝子が同定された。増殖利用嫌気性呼吸メカニズムは、硝酸塩、亜硝酸塩及びジメチルスルホキシドのような無機電子受容体を使用して機能的電子輸送系をコードしている遺伝子を同定することによって見い出された。トリカルボン酸（ＴＣＡ）回路の３つの酵素をコードしている遺伝子はゲノムにはないように思われる。クエン酸シンターゼ、イソクエン酸デヒドロゲナーゼ及びアコルドターゼ（acordtase？アコニット酸ヒドラターゼのことか）は予測コード化領域を検索するか又は大腸菌酵素を翻訳における全ゲノムに対するペプチド照会として使用しても見い出されなかった。これは、特定された培養培地で要求される非常に高いグルタミン酸塩値（１ｇ／Ｌ）の説明を提供している（Ｋlein及びＬuginbuhl、Ｊ．Ｇen．Ｍicrobiol．113 : 409(1979年)。グルタミン酸塩は、グルタミン酸デヒドロゲナーゼによってアルファ-ケトグルタレートに変換することによってＴＣＡ回路に向けることができる。完全なＴＣＡ回路がない場合、グルタミン酸塩は多分、ＴＣＡ回路から分岐するプリカーサーを使用することによってアミノ酸の生合成の炭素源として役立つ。機能的な電子輸送系は、最終電子受容体として酸素を使用してＡＴＰの産生に利用することができる。これまでに答えられなかった病原性及びビルレンスに関する問題は、癒着及びリポオリゴ糖発生遺伝子のような或るクラスの遺伝子を試験することによって取り扱うことができる。モキソン（Ｍoxon）及び共同研究者（Ｗeiser等、Ｃell 5 9 :657（1987年））は、これらのビルレンス関連遺伝子が縦に並んだ四量体反復を含んでおり、そしてこの反復は複製中に１つ又はそれより多い反復単位の頻繁な付加及び欠失を受け、その結果この遺伝子の読み取り枠が変更されそしてそれによってその発現が変更されているという証拠を得ている。今や、完全なゲノム配列を使用して、このような縦に並んだ全ての配列区域の位置を決め（図５）そしてこのような潜在的ビルレンス遺伝子の相変動におけるそれらの役割を決定することを開始することが可能である。インフルエンザ菌Ｒdは非常に効率的な天然のＤＮＡ形質転換系を有している（Ｋahn及びＳmith、Ｊ．Ｍembrane Ｂiol．138:155(1984年)。特有のＤＮＡ取込み配列部位、ゲノム内の多数のコピー中に存在している５'ＡＡＧＴＧＣＧＧＴは効率的なＤＮＡ取込みに必要であることが示されている。今や、これらの全ての部位の位置を決めそしてそれらの分布を完全に記載することが可能である。形質転換に関係のある15の遺伝子は既に記載されており、そして配列が決定されている(Ｒedfield，Ｒ.、Ｊ．Ｂacteriol．173:5612(1991年); Ｃhandler，Ｍ. 、Ｐroc．Ｎatl．Ａcad．Ｓci．Ｕ.Ｓ.Ａ．89:1616(1992年); Ｂarouki及びＳmi th、Ｊ．Ｂacteriol．163(2) : 629(1985年); Ｔomb等、Ｇene 104:1(1991年); Ｔomb，Ｊ.、Ｐroc．Ｎatl．Ａcad．Ｓci．Ｕ.Ｓ.Ａ．89:10252（1992年）)。６つの遺伝子、comＡからcomＦは、プロモーターの上流で１つのらせんターンの周囲の22bpのバリンドローム受容能調節要素（ＣＲＥ）によって明白に（？）制御されているオペロンを含んでいる。rec-2形質転換遺伝子もこの要素によって制御されている。今や、ゲノム内のＣＲＥの追加的コピーの位置を決定しそしてＣＲＥ制御下の潜在的形質転換遺伝子を発見することが可能である。更に、他の広範囲の調節要素を今や容易に発見することが可能であり、これはこれまでは可能でなかった。細菌における１つの良好に記載された遺伝子調節系は、或る種の環境シグナルを検出するセンサー分子とセンサーの活性化形態でリン酸化されている調節分子から構成された「２成分」系である。調節タンパク質は一般的には、センサーによって活性化されたとき特定の遺伝子組を開始又は停止させる転写因子である( 総説については、Ａlbright等、Ａnn．Ｒev．Ｇenet．23;311(1989年); Ｐarkin son及びＫofoid、Ａnn．Ｒev．Ｇenet．26:71(1992年)参照)。大腸菌は40のセンサー−調節対を有していると考えられている（Ａlbright等、Ａnn．Ｒev．Ｇene t．23:311(1989年); Ｐarkinson及びＫofoid、Ａnn．Ｒev．Ｇenet．26:71(1992 年))。インフルエンザ菌ゲノムはtblastn及びtfastaを使用して、代表的なタンパク質を用いてセンサーと調節タンパク質の各科から検索した。４つのセンサー及び５つの調節タンパク質が同定され、他の種から得られるタンパク質と類似していた（表６）。ＣpxＲを除いて、各調節タンパク質用の対応するセンサーがあるように思われる。大腸菌から得られるＣpxＡタンパク質で検索することによって、表６に示された４つのセンサーのうち３つが同定されたが、追加的な顕著な適合は見られなかった。配列類似性のレベルがtfastaで検出できないほど十分に低い可能性がある。ＮtrＣのクラスの調節タンパク質を代表するものは全く見られなかった。このクラスのタンパク質はＲＮＡポリメラーゼのシグマ54サブユニットと直接相互作用し、そしてこれはインフルエンザ菌には存在していない。調節タンパク質はすべてＯmpＲサブクラスに入る(Ａlbright等、Ａnn ．Ｒev．Ｇenet．23:311(1989年); Ｐarkinson及びＫofoid、Ａnn．Ｒev．Ｇene t．26:71(1992年))。インフルエンザ菌のphoＢＲ及びbasＲＳ遺伝子は互いに隣接しており、そして多分オペロンを形成する。nar及びarc遺伝子は互いに隣接して位置していない。完全なゲノム配列によって答えることができる最も興味のある問題は、どの遺伝子又は経路を欠いているかに関するものである。非病原性のインフルエンザＲ d株は病原性の血清型ｂ株と顕著に異なっている。これら２つの株間の差異の多くは感染性に影響を与える因子であるように思われる。例えば、細菌と宿主細胞との接着に関与する繊毛のある遺伝子集団（vanＨam等、Ｍol．Ｍicrobiol．13: 673（1994年））を形成する８つの遺伝子はＲd株には存在していないことが今や示されている。インフルエンザ菌タイプｂ株では繊毛のある集団が側部に付いているpepＮ及びpurＥ遺伝子はＲd株では互いに隣接しており（図７）、繊毛のある集団（duster→cluster）全体が切り取られていることを示唆している。より広いレベルでは、大腸菌から得られるタンパク質コード化遺伝子の冗長でない組、即ちウィスコンシン大学のジーンバンクのゲノムプロジェクト（Ｇenome Ｐro ject）連続物: ジーンバンク受理Ｄ10483、Ｌ10328、Ｕ00006、Ｕ00039、Ｕ140 03及びＵ118997から得られる1,216個の予測タンパク質配列（Ｙura等、Ｎucleic Ａcids Ｒesearch 20:3305(1992年); Ｂurland等、Ｇenomics 16:551（1993年））を利用して、我々は大腸菌のどのタンパク質がインフルエンザ菌に存在していないかを決定した。適合の最小閾は、弱い適合であっても陽性として得点し、そしてそれによってインフルエンザ菌にはいない大腸菌の遺伝子の最小見積もりを与えた。tblastnを使用して、大腸菌のタンパク質の各々を完全ゲノムに対して検索した。＞100のblast得点は適合と考えた。合計627種の大腸菌タンパク質がインフルエンザ菌ゲノムの少なくとも１つの領域で適合し、そして589種のタンパク質は適合しなかった。589種の非適合タンパク質を調べそして大腸菌の仮定上のタンパク質を不釣り合いに多数含有していることが分かった。同定された大腸菌タンパク質68パーセントがインフルエンザ菌の配列によって適合され、一方仮定上のタンパク質の38％しか適合しなかった。タンパク質は、他のどの既知のタンパク質とも適合しないことに基づいて仮定上のタンパク質として注釈を付ける(Ｙura等、Ｎucleic Ａcids Ｒesearch 20:3395( 1992年); Ｂurland等、Ｇenomics 16:551（1993年）)。適合しないタンパク質のなかで仮定上のタンパク質が過剰に提示されることについて少なくとも２つの可能性のある説明を提供することができる: 仮定上のタンパク質は実際には翻訳されない（少なくとも注釈を付けたフレームでは）かんたはこれらは大腸菌特異的タンパク質であり、最も密接に関連した種、例えばネズミチフス菌を除いて如何なる種にも見られないように思われる。合計384の予測コード化領域はジーンバンクリリース87の６フレーム翻訳とあまり類似性を示さなかった。これらの同定されていないコード化領域はfastaを用いて互いに比較した。幾つかの新規な遺伝子群が同定された。例えば、データベースと適合しない２つの予測コード化領域（ＨＩ0591、ＨＩ0852）はそれらのほぼ完全な長さ（それぞれ、139及び143個のアミノ酸残基）で75％の同一性を共有している。これら領域は互いに類似しているが現在のデータベースで得られるどのタンパク質とも適合しないことは、それらが新規な細胞機能を表わし得ることを示唆している。同定されなかったコード化領域には、顕著なアミノ酸同一性がない場合であってもレセプターのメンバーと輸送遺伝子群間にしばしば保持される潜在的膜−スパニングドメインのパターンを示している水治療法を含めて他のタイプの分析を当てはめることができる。膜結合チャンネンルタンパク質に特徴的である周期的パターンを有する潜在的トランスメンブレインドメインを示している同定されなかった予測コード化領域の５つの例を図８に示す。このような情報を使用して、これら遺伝子の標的とされた欠失又は突然変異によって影響を受ける細胞機能の特異的な特徴に焦点を合わせることができる。インフルエンザ菌生物学の医学的に重要な特徴における興味はこの生物のビルレンス特徴を決定する遺伝子に特に集中している。細菌、カタラーゼ遺伝子の特徴が決定され、そして可能性のあるビルレンス関連遺伝子として配列が決定された(Ｂishai等、Ｊ．Ｂacteriol．176:2914（1994年）)。莢膜多糖類の原因である多数の遺伝子がマップされそして配列が決定された(Ｋroll等、Ｍol．Ｍicrob iol．5(6):1549（1991年）)。幾つかの外層膜タンパク質遺伝子が同定されそして配列が決定された(Ｌangford等、Ｊ．Ｇen．Ｍicrobiol．138:155（1992年）) 。外層膜のリポオリゴ糖成分及びその合成経路の遺伝子は集中的に研究されている（Ｗeiser等、Ｊ．Ｂacteriol．173:3304(1990年)。ワクチンは利用可能であるが、外層膜成分の研究はワクチンを改良する必要性によって幾分動機付けられている。データの入手可能性インフルエンザ菌ゲノム配列は受理番号Ｌ42023でゲノム配列データベース（Ｇenome Ｓequence ＤataＢase）（ＧＳＤＢ）に入れられている。同定された開始及び停止コドンを有する各予測コード化領域のヌクレオチド配列及びペプチド翻訳もＧＳＤＢによって受理されている。インフルエンザ菌タンパク質に対する抗体の産生実質的に純粋なタンパク質又はポリペプチドは、トランスフェクションされるか又は形質転換された細胞から当該技術で既知の方法のいずれか１つを使用して単離される。タンパク質はまた、組換え原核生物発現系、例えば大腸菌内で産生させることができるか又は化学的に合成することができる。最終調製物中のタンパク質の濃度は、例えば、アミコン（Ａmicon）フィルターデバイスで濃縮して２，３マイクログラム／mlのレベルに調整される。次に、このタンパク質に対するモノクローナル又はポリクローナル抗体を次のようにして調製することができる：ハイブリドーマ融合によるモノクローナル抗体の産生記載したようにして同定しそして単離したペプチドのいずれかのエピトープに対するモノクローナル抗体は、コーラージー（Ｋohler，Ｇ.）及びミルスタインシー（Ｍilstein Ｃ.）（Ｎature 25:495（1975年））の古典的な方法又はその修正方法に従ってネズミハイブリドーマから調製することができる。簡単に言えば、マウスを、２,３週間に亘って２,３マイクログラムの選択したタンパク質で繰り返し接種する。次いで、マウスを屠殺し、そして脾臓の抗体産生細胞を単離する。脾臓細胞をポリエチレングリコールによってマウスミエローマ細胞と融合させ、そして過剰の融合していない細胞は、アミノブテリン含む選択培地（ＨＡＴ培地）で系を増殖させて破壊する。成功裏に融合した細胞を希釈し、希釈分別物を微量滴定ウエルに入れ、そこで培養物の増殖を継続する。抗体産生クローンは、エングバールイー（Ｅngvall，Ｅ.）（Ｍeth．Ｅnzymol．70:419（1980 年）によって最初に記載されたエリザ法及びその修正方法のような免疫アッセイ方法によってウエルの上清液中の抗体を検出して同定する。選択した陽性クローンを拡張しそしてそれらのモノクローナル抗体産生物を使用するために採集することができる。モノクローナル抗体の詳細な産生方法はデービスエル（Ｄavis Ｌ.）等（Ｂasic Ｍethods in Ｍolecular Ｂiology Ｅlsevier、ニューヨーク、21-2章（1989年）で記載されている。免疫法によるポリクローナル抗体産生単ータンパク質の異種エピトープを含有するポリクローナル抗血清は、修正しないか又は免疫原性を高めるために修正することができる上記の発現タンパク質で適当な動物に免疫を与えることによって製造することができる。効果的なポリクローナル抗体産生は高原と宿主種の両方に関連した多数の因子によって影響をうける。例えば、小分子は他の分子より免疫原になりにくい傾向があり、そして担体やアジュバントの使用を必要とすることがある。更に、宿主動物は接種部位や投与量に応答して変動し、不適切又は過剰の抗原投与量では低力価の抗血清が得られる。多数の皮内部位に投与した少量（ngレベル）の抗原は最も信頼性があるように思われる。ウサギに対する効乗的な免疫法プロトコールはバイツカイチスジェイ（Ｖaitukaitis，Ｊ.）等（Ｊ．Ｃlin．Ｍetab．33:988〜991（1978年）に見ることができる。規則的な間隔でブースター注射を与え、そして半定量法、例えば既知の濃度の抗原に対する寒天中の二重免疫拡散法によって測定するとき、抗血清の抗体力価が下降し始めたとき、抗血清を採集することができる。例えば、オウチターロニーオー（Ｏuchterlony，Ｏ）等（Ｈandbook of Ｅxperimental Ｉmmunology、Ｗier Ｄ,編集、Ｂlackwell（1973年）中の19章）参照。抗体の高原濃度は通常0 .1〜0.2mg／血清ml（約12μＭ）の範囲内である。抗原に対する抗血清の親和性は、例えば、フィッシャーディ（Ｆisher Ｄ.）（Ｍanual of Ｃlinical Ｉmmn ology、第２版、Ｒose及びＦriedman編集、Ａmer．Ｓoc．Ｆor Ｍicrobiology、ワシントンＤ.Ｃ．（1980年）中の42章）が記載したようにして、競合結合曲線を作成して測定する。どちらかのプロトコールに従って調製した抗体製剤は、生物学的試料中の抗原を有する物質の濃度を測定する定量的イムノアッセイで有用である ; それらはまた、生物学的試料中の抗原の存在を同定するために半定量的に又は定性的にも使用される。ＰＣＲプライマーの作成及びＤＮＡの増幅表１(ａ)及び２に開示されたようなインフルエンザ菌Ｒdゲノムの種々のフラグメントを、多様な用途用のＰＣＲプライマーを作成するために本発明に従って使用することができる。ＰＣＲプライマーは好ましくは少なくとも15塩基、そして更に好ましくは少なくとも18塩基の長さであることができる。プライマー配列を選択するとき、プライマー対は、融点が概ね同一であるように、概ね同一のＧ／Ｃ比率を有することが好ましい。この実施例のＰＣＲプライマー及び増幅ＤＮＡには以下の実施例での用途がある。ＯＲＦに対応するＤＮＡ配列からの遺伝子発現表１(ａ)又は２で提供されるインフルエンザ菌Ｒdゲノムのフラグメントは、慣用の技術を使用して発現ベクター中に導入される。（クローン化した配列を、哺乳動物、酵母、昆虫又は細菌発現系でのタンパク質翻訳を指令する発現ベクター中に移す技術は当該技術分野で周知である。）市販で入手できるベクター及び発現系は、ストラタジーン（Ｓtratagene）（カリフォルニア州ラジョラ）、プロメガ（Ｐromega）（ウィスコンシン州マジソン）及びインビトロジェン（Ｉnv itrogen）（カリフォルニア州サンディエゴ）を含む多様な供給者から入手することができる。所望の場合、発現を高めそして適当なタンパク質の折りたたみを促進するために、ハットフィールド（Ｈatfield）等の米国特許第5,082,767号（これは参照として本明細書に組み入れる）で説明されているようにして、配列のコドン関連及びコドン対合を特定の発現生物用に最適化することができる。以下は、ヘモフィルス属のゲノムフラグメントのクローン化ＯＲＦからポリペプチド（単数又は複数）を発生させる１つの例示的方法として提供する。このＯＲＦは細菌起源であるためポリＡ配列を欠いているので、この配列は、例えば、ポリＡ配列を添加し、ＢglI及びSalI制限エンドヌクレアーゼ酵素を使用し、そして真核生物発現系で使用するための哺乳動物ベクターｐＸＴ1（Ｓtratagene）中に導入してｐＳＧ5（Ｓtratagene）からスプライシングして構築することができる。ｐＴＸ1はＬＴＲ及びモロニ−マウス白血病ウイルスから得られるgag遺伝の１部を含有している。構築物中のＬＴＲの位置によって安定なトランスフェクションが可能になる。ベクターは単純ヘルペスチミジンキナーゼプロモーター及び選択性ネオマイシン遺伝子を含んでいる。ヘモフィルス属のＤＮＡは、ヘモフィルス属のＤＮＡと相補的であるオリゴヌクレオチドプライマーを使用して細菌ベクターからＰＣＲによって取得し、そして５'プライマー中に導入されたＰstI 用及び対応するヘモフィルス属ＤＮＡ３'プライマーの５'末端のBglII用の制限エンドヌクレアーゼ配列を有しており、ヘモフィルス属のＤＮＡが確実にポリＡ配列と共に後に続くように注意して配置されている。生じたＰＣＲ反応から得られた精製フラグメントをＰstIで消化し、エキソヌクレアーゼでブラント末端とし、BglIIで消化し、精製しそしてｐＴＸ1と結合し、今やポリＡ及び消化された BglIIを含有している。結合した産生物は、産生物明細書に概略した条件下でリポフェクチン（Ｌipof ectin）（Ｌife Ｔechnologies，Ｉnc.、ニューヨーク州グランドアイランド）を使用してマウスＮＩＨ 3Ｔ3細胞中にトランスフェクションする。陽性のトランスフェクション産物は、トランスフェクションした細胞を600μg／mlのＧ418 （Ｓigma、ミズーリー州セントルイス）中で増殖させた後に選択する。タンパク質は好ましくは上清液中に放出される。しかし乍ら、タンパク質が膜結合ドメインを有している場合、タンパク質は細胞内に保持されるか又は発現が細胞表面に限定される。トランスフェクション産物を精製しそして配置しなければならないので、予測ヘモフィルス属ＤＮＡ配列から合成された合成15量体ペプチドをマウスに注射して、ヘモフィルス属ＤＮＡによってコードされているポリペプチドに対する抗体を発生させる。抗体産生ができない場合、ヘモフィルスＤＮＡ配列を真核発現ベクター中に追加的に導入し、そして例えば、β-グロビンとのキメラとして発現させる。β-グロビンに対する抗体はキメラを精製するために使用される。次に、β-グロビン遺伝子とヘモフィルスＤＮＡ間で処理された対応するプロテアーゼ開裂部位を使用して、翻訳後に２つのポリペプチドフラグメントを互いに分離する。β-グロビンキメラ(chimerics)を発生する１つの有用な発現ベクターはｐＳＧ5（Ｓtrat agene）である。このベクターはウサギβ-グロビンをコードしている。ウサギβ -グロビン遺伝子のイントロンIIは発現された転写物のスプライシングを促進し、そしてこの構築物中に導入されたポリアデニル化シグナルは発現値を高める。記載したこれらの技術は分子生物学の分野の熟練者には周知である。標準的な方法はデービス（Ｄavis）（既出か）等のような方法教科書で発表されており、そしてストラタジーン、ライフテクノロジーインク（Ｌife Ｔechnologies，Ｉnc. ）又はプロメガ（Ｐromega）の技術支援代理人から多数の方法を入手することができる。ポリペプチドは、インビトロエクスプレストランスレーションキット（Ｅxpress^TM Ｔranslation Ｋit）（Ｓtratagene）のようなインビトロ翻訳系を使用してどちらかの構築物から追加的に産生させることができる。本発明は明確さ及び理解を目的として幾らか詳細に記載したが、当該技術分野の熟練者は、様式や詳細における種々の変更が本発明の真の範囲から逸脱することなくなされ得ることを理解するであろう。上記で言及した特許、特許出願及び刊行物はすべて参照として本明細書に組み入れる。脂肪酸／リン脂質代謝アセチル補酵素Ａアセチルトランスフェラーゼ(チオラーゼ)(ｆａｄＡ){クロストリジウム・アセトブチリカム} 脂肪酸代謝に含まれるｆａｄＲ蛋白質(ｆａｄＲ){大腸菌} (３Ｒ)−ヒドロキシミリストールアシル担体蛋白質デヒドラーゼ(ｆａｂＺ){大腸菌} ３−ケトアシル−アシル担体蛋白質レダクターゼ(ｆａｂＧ){大腸菌} アセチル−ＣｏＡカルボキシラーゼ(ａｃｃＡ){大腸菌} アシル担体蛋白質(ａｃｐＰ){大腸菌} アシル−ＣｏＡチオエステラーゼII(ｔｅｓＢ){大腸菌} ベーターケトアシル−ＡＣＰシンターゼ(ｆａｂＢ){大腸菌} ベーターケトアシル−アシル担体蛋白質シンターゼIII(ｆａｂＨ){大腸菌} ビオチンカルボキシル担体蛋白賀(ａｃｃＢ){大腸菌} ビオチンカルボキシラーゼ(ａｃｃＣ){大腸菌} Ｄ−３−ヒドロキシデカノイル−(アシル担体蛋白質)デヒドラターゼ(ｆａｂＡ) {大腸菌} ジアシルグリセロールキナーゼ(ｄｇｋＡ){大腸菌} 長鎖脂肪酸ｃｏＡリガーゼ{ホモ・サピエンス} マロニル補酵素Ａ−アシル担体蛋白質トランスアシラーゼ(ｆａｂＤ酵素{大腸菌 } 短鎖アルコールデヒドロゲナーゼ同族体(ｅｎｖＭ){大腸菌} ＵＳＧ−１蛋白質(ｕｓｇ){大腸菌} １−アシル−グリセロール−３−リン酸アシルトランスフェラーゼ(ｐｌｓＣ){ 大腸菌} ＣＤＰ−ジグリセリドシンテターゼ(ｃｄｓＡ){大腸菌} グリセセール−３−リン酸アシルトランスフェラーゼ(ｐｌｓＢ){大腸菌} ホスファチジルグリセロリン酸ホスファターゼＢ(ｐｇｐＢ){大腸菌} ホスファチジルグリセロリン酸シンターゼ(ｐｇｓＡ){大腸菌} ホスファチジルセリンデカルポキシラーゼプロ酵素(ｐｓｄ){大腸菌} ホスファチジルセリンシンターゼ(ｐｓｓＡ){大腸菌} 蛋白質Ｄ(ｈｐｄ){ヘモフィルス・インフルエンゼ} プリン類、ピリミジン類、ヌクレオシド類およびヌクレオチド類プリンリボヌクレオチド生合成５'−ホスホリボシル−５−アミノ−４−イミダゾールカルボキシラーゼII(ｐｕｒＫ){大腸菌} ５'−ホスホリボシル−５−アミノイミダゾールシンテターゼ（ｐｕｒＭ){大腸菌} ５'グアニル酸キナーゼ(ｇｍｋ){大腸菌} アデニル酸キナーゼ(ＡＴＰ−ＡＭＰトランスホスホリラーゼ)(ａｄｋ){ヘモフィルス・インフルエンゼ} アデニロコハク酸リアーゼ(ｐｕｒＢ){大腸菌} アデニロコハク酸シンテターゼ(ｐｕｒＡ){大腸菌} アミドホスホリボシルトランスフェラーゼ(ｐｕｒＦ){大腸菌} ホルミルグリシンアミドリボヌクレオチドシンテターゼ(ｐｕｒＬ){大腸菌} ホルミルテトラヒドロ葉酸ヒドロラーゼ(ｐｕｒＵ){大腸菌} ｇｕａＡ蛋白質(ｇｕａＡ){大腸菌} イノシン−５'−一リン酸デヒドロゲナーゼ(ｇｕａＢ){アシネトバクター・カルコアセチカス} ヌクレオシド二リン酸キナーゼ(ｎｄｋ){大腸菌} ホスホリボシルアミン-グリシンリガーゼ(ｐｕｒＤ){大腸菌} ホスホリボシルアミノイミダゾールカルボキシラーゼ触媒サブユニット(ｐｕｒＥ){ヘモフィルス・インフルエンゼ} ホスホリボシルアミノイミダゾールカルボキサミドホルミルトランスフェラーゼ (ｐｕｒＨ){大腸菌} ホスホリボシルグリシンアミドホルミルトランスフェラーゼ(ｐｕｒＮ){大腸菌} ホスホリボシルピロリン酸シンテターゼ(ｐｒｓＡ){ネズミチフス菌} ＳＡＩＣＡＲシンテターゼ(ｐｕｒＣ){肺炎棹菌} ピリミジンリボヌクレオチド生合成ジヒドロオロト酸デヒドロゲナーゼ(ジヒドロオロト酸オキシダーゼ)(ｐｙｒＤ) {大腸菌} オロト酸ホスホリボシルトランスフェラーゼ(ｐｙｒＥ){大腸菌} ｐｙｒＦオペロンコード付けオロチジン５'−一リン酸(ＯＭＰ)デカルボキシラーゼ{大腸菌} ｐｙｒＦ蛋白質(ｐｙｒＦ){大腸菌} ウラシルホスホリボシルトランスフェラーゼ(ｐｙｒＲ){バシラス・カルドリチカス} ２'−デオキシリボヌクレオチド代謝嫌気性リボヌクレオシド−三リン酸レダクターゼ(ｎｒｄＤ){大腸菌} デオキシシチジン三リン酸デアミナーゼ(ｄｏｄ){大腸菌} デオキシウリジントリホスファターゼ(ｄｕｔ){大腸菌} グルタレドキシン(ｇｒｘ){大腸菌} ｎｒｄＢ蛋白質(ｎｒｄＢ){大腸菌} リボヌクレオシド−二リン酸レダクターゼ１アルファ鎖(ｎｒｄＡ){大腸菌} チオレドキシンレダクターゼ(ｔｒｘＢ){大腸菌} チミジル酸シンテターゼ(ｔｈｙＡ){大腸菌} ヌクレオシド類およびヌクレオチド類の再利用２',３'−環式−ヌクレオチド２'−ホスホジエステラーゼ(ｃｐｄＢ){大腸菌} アデニンホスホリボシルトランスフェラーゼ(ａｐｔ){大腸菌} アデノシオン−テトラホスファターゼ(ａｐａＨ){大腸菌} シチジンデアミナーゼ(シチジンアミノヒドロラーゼ)(ｃｄａ){大腸菌} シチジル酸キナーゼ(ｃｍｋ){大腸菌} シチジル酸キナーゼ(ｃｍｋ){大腸菌} プリン−ヌクレオシドホスホリラーゼ(ｄｅｏＤ){大腸菌} チミジンキナーゼ(ｔｄｋ){大腸菌} ウラシルホスホリボシルトランスフェラーゼ(ｕｐｐ){大腸菌} ウリジンホスホリラーゼ(ｕｄｐ){大腸菌} キサンチングアニンホスホリボシルトランスフェラーゼｇｐｔ(ｘｇｐｒｔ){大腸菌} キサンチン−グアニンホスホリボシルトランスフェラーゼ(ｘｇｐｒｔ){ネズミチフス菌} 推定ＡＴＰａｓｅ(ｍｒｐ){大腸菌} 糖−ヌクレオチド生合成、変換５'−ヌクレオチダーゼ(ｕｓｈＡ){ホモ・サピエンス} ＣＭＰ−ＮｅｕＮＡｃシンテターゼ(ｓｉａＢ){髄膜炎菌} ガラクトース−１−リン酸ウリジルトランスフェラーゼ(ｇａｌＴ){ヘモフィルス・インフルエンゼ} グルコースリン酸ウリジルトランスフェラーゼ(ｇａｌＵ){大腸菌} ｕｄｐ−グルコース４−エピメラーゼ(ガラクトワルデナーゼ)(ｇａｌＥ){ヘモフィルス・インフルエンゼ} ＵＤＰ−Ｎ−アセチルグルコサミンピロホスホリアーゼ(ｇｌｍＵ){大腸菌} ヌクレオチドおよびヌクレオシド相互変換デオキシグアノシン三リン酸トリホスホヒドロラーゼ(ｄｇｔ){大腸菌} ウリジンキナーゼ(ウリジンモノホスホキナーゼ)(ｕｄｋ){大腸菌} 調節機能アデニル酸シクラーゼ(ｃｙａＡ){ヘモフィルス・インフルエンゼ} 嫌気性呼吸調節蛋白質ＡＲＣＡ(ＤＹＥ耐性蛋白質)(ａｒｃＡ){大腸菌} 嫌気性呼吸調節感知因子蛋白質(ａｒｃＢ){大腸菌} ａｒａＣ−様転写調節因子{ストレプトマイセス・リピダンス} アルギニン抑制因子蛋白質(ａｒｇＲ){大腸菌} ａｒｓＣ蛋白質(ａｒｓＣ){プラスミドＲ７７３} ＡＴＰ−依存性プロテイナーゼ(ｌｏｎ){大腸菌} ＡＴＰ：ＧＴＰ３'−ピロホスホトランスフェラーゼ(ｒｅｌＡ){大腸菌} 炭素飢餓蛋白質(ｃｓｔＡ){大腸菌} 炭素貯蔵調節因子(ｃｓｒＡ){大腸菌} 環式ＡＭＰ受容因子蛋白質(ｃｒｐ){ヘモフィルス・インフルエンゼ} 環式ＡＭＰ受容因子蛋白質(ｃｒｐ){ヘモフィルス・インフルエンゼ} ｃｙｓレグロン転写活性化因子(ｃｙｓＢ){大腸菌} 第二鉄吸収調節蛋白質(ｆｕｒ){大腸菌} 繊毛転写調節抑制因子(ｐｉｌＢ){淋菌} 繊毛転写調節抑制因子(ｐｉｌＢ){淋菌} ホリルポリグルタミン酸−ヒジドロ葉酸シンテターゼ発現調節因子(ａｃｃＤ){ 大腸菌} フマル酸(および硝酸)還元調節蛋白質(ｆｎｒ){大腸菌} ガラクトースオペロン抑制因子(ｇａｌＳ){ヘモフィルス・インフルエンゼ} グルコキナーゼ調節因子{ラッタス・ノルペギカス} グリセロール−３−リン酸ルグロン抑制因子(ｇｌｐＲ){大腸菌} グリセロール−３−リン酸ルグロン抑制因子(ｇｌｐＲ){大腸菌} グリシン切断系転写活性化因子(ｇｃｖＡ){大腸菌} ＧＴＰ−結合蛋白質(ｅｒａ){大腸菌} ＧＴＰ−結合蛋白質(ｏｂｇ){枯草菌} 過酸化水素−誘導性活性化因子(ｏｘｙＲ){大腸菌} Ｌ−フコースオペロン活性化因子(ｆｕｃＲ){大腸菌} ｌａｃＺ発現調節因子(ｉｃｃ){大腸菌} ロイシン応答性調節蛋白質(ｌｒｐ){大腸菌} ロイシン応答性調節蛋白質(ｌｒｐ){大腸菌} ＬＥＸＡ抑制因子(ｌｅｘＡ){大腸菌} リポオリゴ糖蛋白質(ｌｅｘ２Ａ){ヘモフィルス・インフルエンゼ} リポオリゴ糖蛋白質(ｌｅｘ２Ａ){ヘモフィルス・インフルエンゼ} ｍｅｔＦアポ抑制因子(ｍｅｔＪ){大腸菌} モリブデン輸送系交互ニトロゲナーゼ調節因子(ｍｏｄＤ){ロドバクター・カプスラタス} ｍｓｂＢ蛋白質(ｍｓｂＢ){大腸菌} ｍｓｂＢ蛋白質(ｍｓｂＢ){大腸菌} 翻訳の負の調節因子(ｒｅｌＢ){大腸菌} 負のｒｐｏ調節因子(ｍｃｌＡ){大腸菌} 硝酸感知因子蛋白質(ｎａｒＱ){大腸菌} 硝酸／亜硝酸応答調節因子蛋白質(ｎａｒＰ){大腸菌} 窒素調節因子蛋白質Ｐ−II(ｇｌｎＢ){大腸菌} 五リン酸グアノシン−３'−ピロホスホヒドロラーゼ(ｓｐｏＴ){大腸菌} リン酸レグロン感知因子蛋白質(ｐｈｏＲ){大腸菌} リン酸レグロン転写調節蛋白質(ｐｈｏＢ){大腸菌} 推定ｎａｄＡＢ転写調節因子(ｎａｄＲ){大腸菌} プリンヌクレオチド合成抑制因子蛋白質(ｐｕｒＲ){大腸菌} 推定ムレイン遺伝子調節因子(ｂｏｌＡ){大腸菌} ｒｂｓ抑制因子(ｒｂｓＲ){大腸菌} 調節蛋白質(ａｓｎＣ){大腸菌} マルトース代謝に含まれる調節蛋白質ｓｆｓｌ(ｓｆｓＡ){大腸菌} チトクロームＰ４５０用の抑制因子(Ｂｍ３Ｒ１)巨大菌} ＲＮＡポリメラーゼシグマ−３２因子(熱衝撃調節蛋白質Ｆ３３４)(ｒｐｏＨ){ 大腸菌} ＲＮＡポリメラーゼシグマ−７０因子(ｒｐｏＤ){大腸菌} ＲＮＡポリメラーゼシグマ−Ｅ因子(ｒｐｏＥ){大腸菌} ｂａｓＲ用の感知因子蛋白質(ｂａｓＳ){大腸菌} 緊縮飢餓蛋白質(ｓｓｐＢ){大腸菌} 緊縮飢餓蛋白質Ａ(ｓｓｐＡ){ヘモフィルス・インフルエンゼ} ｍｅｔＥおよびｍｅｔＨ用のトランス−活性化因子(ｍ８ｔＲ){ヘモフィルス・インフルエンゼ} 転写活性化因子(ｔｅｎＡ){枯草菌} 転写活性化因子蛋白質(ｉｌｖＹ){大腸菌} 転写調節蛋白質(ｂａｓＲ){大腸菌} 転写調節蛋白質(ｔｙｒＲ){大腸菌} トリプトファン抑制因子(ｔｒｐＲ){エンテロバクター・アエロゲネス} ｕｘｕオペロン調節因子(ｕｘｕＲ){大腸菌} キシロースオペロン調節蛋白質(ｘｙｌＲ){大腸菌} 複製ＤＮＡ＝複製、限定／修飾、組み換えＡ／Ｇ−特異的アデニングリコシラーゼ(ｍｕｔＹ){大腸菌} 染色体複製開始因子蛋白質(ｄｎａＡ){大腸菌} 染色体複製開始因子蛋白質(ｄｎａＡ){大腸菌} 交差結合エンドデオキシリボヌクレアーゼ(ｒｕｖＣ){大腸菌} ｄｆｐ蛋白質(ｄｆｐ){大腸菌} ＤＮＡアデニンメチラーゼ(ｄａｍ){大腸菌} ＤＮＡジャイレース、サブユニットＡ(ｇｙｒＡ){大腸菌} ＤＮＡジャイレース、サブユニットＢ(ｇｙｒＢ){大腸菌} ＤＮＡヘリカーゼII(ｕｒｖＤ){ヘモフィルス・インフルエンゼ} ＤＮＡリガース(ｌｉｇ){大腸菌} ＤＮＡ不適性蛋白質(ｍｕｔＨ){大腸菌} ＤＮＡ不適性修復蛋白質(ｍｕｔＳ){大腸菌} ＤＮＡ不適性修復蛋白質ＭＵＴＬ(ｍｕｔＬ){大腸菌} ＤＮＡポリメラーゼＩ(ｐｏｌＡ){大腸菌} ＤＮＡポリメラーゼIIIベーターサブユニット(ｄｎａＮ){大腸菌} ＤＮＡポリメラーゼIIIデルタプライムサブユニット(ｈｏｌＢ){大腸菌} ＤＮＡポリメラーゼIIIデルタサブユニット(ｈｏｌＡ){大腸菌} ＤＮＡポリメラーゼIIIイプシロンサブユニット(ｄｎａＱ){大腸菌} ＤＮＡポリメラーゼIII、アルファ鎖(ｄｎａＥ){大腸菌} ＤＮＡポリメラーゼIII、ｃｈｉサブユニット(ｈｏｌＣ){ヘモフィルス・インフルエンゼ} ＤＮＡポリメラーゼIII、ｐｓｉサブユニット(ｈｏｌＤ){大腸菌} ＤＮＡプリマーゼ(ｄｎａＧ){大腸菌} ＤＮＡリコンビナーゼ(ｒｅｃＧ){大腸菌} ＤＮＡ修復蛋白質(ｒｅｃＮ){大腸菌} ＤＮＡトポイソメラーゼＩ(ｔｏｐＡ){枯草菌} ＤＮＡ−３−メチルアデニングリコシダーゼＩ(ｔａｇｌ){大膓菌} ＤＮＡ−依存性ＡＴＰａｓｅ、ＤＮＡヘリカーゼ(ｒｅｃＱ){大腸菌} ｄｏｄ蛋白質(ｄｏｄ){セレイシア・マルセッセンス} 用量−依存性ｄｎａＫ抑制因子蛋白質(ｄｋｓＡ){大腸菌} ホルムアミドピリミジン−ＤＮＡグリコシラーゼ(ｆｐｇ){大腸菌} グルコース抑制された分割蛋白質(ｇｉｄＡ){大腸菌} グルコース抑制された分割蛋白質(ｇｉｄＢ){大腸菌} Ｈｉｎ組み換え促進因子結合蛋白質(ｆｉｓ){大腸菌} Ｈｉｎｃｌｌエンドヌクレアーゼ(Ｈｉｎｃｌｌ){ヘモフィルス・インフルエンゼ} Ｈｉｎｄｌｌｌ修飾メチルトランスフェラーゼ(ｈｉｎｄｌｌｌＭ){ヘモフィルス・インフルエンゼ} Ｈｉｎｄｌｌｌ制限エンドヌクレアーゼ(ｈｉｎｄｌｌｌＲ){ヘモフィルス・インフルエンゼ} ホリデイ結合ＤＮＡヘリカーゼ(ｒｕｖＡ){大腸菌} ホリデイ結合ＤＮＡヘリカーゼ(ｒｕｖＢ){大腸菌} インテグラーゼ／リコンビナーゼ蛋白質(ｘｅｒＣ){大腸菌} 組み込み宿主因子アルファ−サブユニット(ｈｉｍＡ){大腸菌} 組み込み宿主因子ベーターサブユニット(ＩＨＦ−ベータ)(ｈｉｍＤ){大腸菌} メチル化された−ＤＮＡ−蛋白質−システインメチルトランスフェラーゼ(ｄａｔｌ){枯草菌} ｍｉｏＣ蛋白質(ｍｌｏＣ){大腸菌} 修飾メチラーゼＨｇｉＤｌ(ＭＨｇｉＤｌ){ヘルペトシフォン・アウランチアクス} 修飾メチラーゼＨｉｎｃｌｌ(ｈｉｎｃｌｌＭ){ヘモフィルス・インフルエンゼ} 突然変異誘発因子ｍｕｔＴ(ＡＴ−ＧＣ変異){大腸菌} 複製の開始の負の調節因子(ｓｅｑＡ){大腸菌} プライモソーム蛋白質ｎ前駆体(ｐｒｉＢ){大腸菌} プライモソーム蛋白質複製因子(ｐｒｉＡ){大腸菌} 推定ＡＴＰ−依存性ヘリカーゼ(ｄｉｎＧ){大腸菌} ｒｅｃＦ蛋白質(ｒｅｃＦ){大腸菌} ｒｅｃＯ蛋白質(ｒｅｃＯ){大腸菌} リコンビナーゼ(ｒｅｃＡ){ヘモフィルス・インフルエンゼ} 組み換え蛋白質(ｒｅｃ２){ヘモフィルス・インフルエンゼ} ｒｅｃＲ蛋白質(ｒｅｃＲ){大腸菌} 調節蛋白質(ｒｅｃＸ){シュードモナス・フルオレッセンス} ｒｅｐヘリカーゼ(ｒｅｐ){大腸菌} 複製蛋白質(ｄｎａＸ){大腸菌} 複製ＤＮＡヘリカーゼ(ｄｎａＢ){大腸菌} 制限酵素(ｈｇｉＤＩＲ){ヘルペトシホン・ギガンテウス} Ｓ−アデノシルメチオニンシンテターゼ２(ｍｅｔＸ){大腸菌} シャフロン−特異的ＤＮＡリコンビナーゼ(ｒｃｉ){大腸菌} 一本鎖ＤＮＡ結合蛋白質(ｓｓｂ){ヘモフィルス・インフルエンゼ} 部位−特異的リコンビナーゼ(ｒｃｂ){大腸菌} トポイソメラーゼＩ(ｔｏｐＡ){大腸菌} トポイソメラーゼIII(ｔｏｐＢ){大腸菌} トポイソメラーゼIVサブユニットＡ(ｐａｒＣ){大腸菌} トポイソメラーゼIVサブユニットＢ(ｐａｒＥ){大腸菌} 転写−修復共役因子(ｔｒｃＦ)(ｍｆｄ){大腸菌} Ｉ型制限酵素ｅｃｏｋｌ特異性蛋白質(ｈｓｄＳ){大腸菌} Ｉ型制限酵素ＥＣＯＲ１２４／３１Ｍ蛋白質(ｈｓｄＭ){大腸菌} Ｉ型制限酵素ＥＣＯＲＩ２４／３１Ｍ蛋白質（ｈｓｄＭ){大腸菌} Ｉ型制限酵素ＥＣＯＲ１２４／３Ｒ蛋白質(ｈｓｄＲ){大腸菌} III型制限−修飾ＥＣＯＰ１５酵素(ｍｏｄ){大腸菌} ウラシルＤＮＡグリコシラーゼ(ｕｎｇ){大腸菌} ｘｐｒＢ蛋白質(ｘｅｒＤ){大腸菌} ＤＮＡの分解エンドヌクレアーゼIII(ｎｔｈ){大腸菌} エクシヌクレアーゼＡＢＣサブユニットＡ(ｕｒｖＡ){大腸菌} エクシヌクレアーゼＡＢＣサブユニットＢ(ｕｒｖＢ){大腸菌} エクシヌクレアーゼＡＢＣサブユニットＣ(ｕｒｖＣ){大腸菌} エキソデオキシリボヌクレアーゼＩ(ｓｂｏＢ){大腸菌} エキソデオキシリボヌクレアーゼＶ(ｒｅｃＢ){大腸菌} エキソデオキシリボヌクレアーゼＶ(ｒｅｃＣ){大腸菌} エキソデオキシリボヌクレアーゼＶ(ｒｅｃＤ){大腸菌} エキシヌクレアーゼIII(ｘｔｈＡ){大腸菌} エキシヌクレアーゼVII、大サブユニット(ｘｓｅＡ){大腸菌} 一本鎖−ＤＮＡ−特異的エキソヌクレアーゼ(ｒｅｏＪ){大腸菌} 転写ＲＮＡ合成、修飾およびＤＮＡ転写ＡＴＰ−依存性ヘリカーゼＨＥＰＡ(ｈｅｐＡ){大腸菌} ＡＴＰ−依存性ＲＮＡヘリカーゼ(ｓｒｍＢ){大腸菌} ＡＴＰ−依存性ＲＮＡヘリカーゼＤＥＡＤ(ｄｅａＤ){大腸菌} ＤＮＡ−依存性ＲＮＡポリメラーゼアルファ鎖(ｒｐｏＡ){大腸菌} ＤＮＡ−依存性ＲＮＡポリメラーゼベータ鎖(ｒｐｏＢ){ネズミチフス菌} ＤＮＡ−依存性ＲＮＡポリメラーゼベータ鎖(ｒｐｏＣ){大腸菌} Ｎ利用物質蛋白質Ｂ(ｎｕｓＢ){大腸菌} プラスミドコビー数調節蛋白質(ｐｃｎＢ){大腸菌} ポリヌクレオチドホスホリラーゼ(ｐｎｐ){大腸菌} 推定ＡＴＰ−依存性ＲＮＡヘリカーゼ(ｒｈｌＢ){大腸菌} ＲＮＡポリメラーゼオメガサブユニット(ｒｐｏＺ){大腸菌} シグマ因子(ａｌｇＵ){緑膿菌} 転写抗終結因子蛋白質(ｎｕｓＧ){大腸菌} 転写延長因子(ｇｒｅＢ){大腸菌} 転写因子(ｎｕｓＡ){ネズミチフス菌} 転写終結因子ｒｈｏ(ｒｈｏ){大腸菌} ＲＮＡの分解アンチコドンヌクレアーゼ遮断剤(ｐｒｒＤ){大腸菌} エキソリボヌクレアーゼII(ＲＮａｓｅｌｌ){大腸菌} リボヌクレアーゼＤ(ｍｄ){大腸菌} リボヌクレアーゼＥ(ｍｅ){大腸菌} リボヌクレアーゼＨ(ｍｈ){大腸菌} リボヌクレアーゼＨII(ＥＣ３１２６４)(ＲＮＡＳＥＨII){大腸菌} リボヌクレアーゼIII(ｒｎｃ){大腸菌} リボヌクレアーゼＰＨ(ｒｐｈ){大腸菌} ＲＮａｓｅＰ(ｍｐＡ){大腸菌} ＲＮａｓｅＴ(ｍｔ){大腸菌} 翻訳リボソーム蛋白質一合成・修飾リボソーム蛋白質Ｌ１(ｒｐＬ１){大腸菌} リボソーム蛋白質Ｌ１０(ｒｐＬ１０){ネズミチフス菌} リボソーム蛋白質Ｌ１１(ｒｐＬ１１){大腸菌} リボソーム蛋白質Ｌ１１メチルトランスフェラーゼ(ｐｒｍＡ){大膓菌} リボソーム蛋白質Ｌ１３(ｒｐＬ１３){ヘモフィルス・インフルエンゼ} リボソーム蛋白質Ｌ１４(ｒｐＬ１４){大腸菌} リボソーム蛋白質Ｌ１５(ｒｐＬ１５){大腸菌} リボソーム蛋白質Ｌ１６(ｒｐＬ１６){大腸菌} リボソーム蛋白質Ｌ１７(ｒｐｌＱ){大腸菌} リボソーム蛋白質Ｌ１８(ｒｐＬ１８){大腸菌} リボソーム蛋白質Ｌ１９(ｒｐＬ１９){大腸菌} リボソーム蛋白質Ｌ２(ｒｐＬ２){大腸菌} リボソーム蛋白質Ｌ２０(ｒｐＬ２０){大腸菌} リボソーム蛋白質Ｌ２１(ｒｐＬ２１){大腸菌} リボソーム蛋白質Ｌ２２(ｒｐＬ２２){大腸菌} リボソーム蛋白質Ｌ２３(ｒｐＬ２３){大腸菌} リボソーム蛋白質Ｌ２４(ｒｐＬ２４){大腸菌} リボソーム蛋白質Ｌ２５(ｒｐＬ２５){大腸菌} リボソーム蛋白質Ｌ２７(ｒｐＬ２７){大腸菌} リボソーム蛋白質Ｌ２８(ｒｐＬ２８)｛大腸菌} リボソーム蛋白質Ｌ２９(ｒｐＬ２９){大腸菌} リボソーム蛋白質Ｌ３(ｒｐＬ３){大腸菌} リボソーム蛋白質Ｌ３０(ｒｐＬ３０){大腸菌} リボソーム蛋白質Ｌ３１(ｒｐＬ３１){大腸菌} リボソーム蛋白質Ｌ３２(ｒｐＬ３２){大腸菌} リボソーム蛋白質Ｌ３３(ｒｐＬ３３){大腸菌} リボソーム蛋白質Ｌ３４(ｒｐＬ３４){大腸菌} リボソーム蛋白質Ｌ３５(ｒｐＬ３５){大腸菌} リボソーム蛋白質Ｌ４(ｒｐＬ４){大腸菌} リボソーム蛋白質Ｌ５(ｒｐＬ５){大腸菌} リボソーム蛋白質Ｌ６(ｒｐＬ６){大腸菌} リボソーム蛋白質Ｌ７／Ｌ１２(ｒｐＬ７／Ｌ１２){大腸菌} リボソーム蛋白質Ｌ９(ｒｐＬ９){大腸菌} リボソーム蛋白質Ｓ１(ｒｐＳ１){大腸菌} リボソーム蛋白質Ｓ１０(ｒｐＳ１０){大腸菌} リボソーム蛋白質Ｓ１１(ｒｐＳ１１){大腸菌} リボソーム蛋白質Ｓ１３(ｒｐＳ１３){大腸菌} リボソーム蛋白質Ｓ１４(ｒｐＳ１４){大腸菌} リボソーム蛋白質Ｓ１５(ｒｐＳ１５){大腸菌} リボソーム蛋白質Ｓ１５(ｒｐＳ１５){大腸菌} リボソーム蛋白質Ｓ１６(ｒｐＳ１６){大腸菌} リボソーム蛋白質Ｓ１７(ｒｐｌＱ){大腸菌} リボソーム蛋白質Ｓ１８(ｒｐＳ１８){大腸菌} リボソーム蛋白質Ｓ１９(ｒｐＳ１９){大腸菌} リボソーム蛋白質Ｓ２(ｒｐＳ２)｛大腸菌} リボソーム蛋白賀Ｓ２１(ｒｐＳ２１){大腸菌} リボソーム蛋白質Ｓ３(ｒｐＳ３){大腸菌} リボソーム蛋白質Ｓ４(ｒｐＳ４){大腸菌} リボソーム蛋白質Ｓ５(ｒｐＳ５){大腸菌} リボソーム蛋白質Ｓ６(ｒｐＳ６){大腸菌} リボソーム蛋白質Ｓ６修飾蛋白質(ｒｉｍＫ){大腸菌} リボソーム蛋白質Ｓ７(ｒｐＳ７){大腸菌} リボソーム蛋白質Ｓ８(ｒｐＳ８){大腸菌} リボソーム蛋白質Ｓ９(ｒｐＳ９){ヘモフィルス・ソンナス} リボソーム−蛋白質−アラニンアセチルトランスフェラーゼ(ｒｉｍｌ){大腸菌} ストレプトマイシン耐性蛋白質(ｓｔｒＡ){ヘモフィルス・インフルエンゼ} アミノアシルｔＲＮＡシンテターゼ類、ｔＲＮＡ修飾アラニル−ｔＲＮＡシンテターゼ(ａｌａＳ){大腸菌} アルギニル−ｔＲＮＡシンテターゼ(ａｒｇＳ){大腸菌} アスパラギニル−ｔＲＮＡシンテターゼ(ａｓｎＳ){大腸菌} アスパルチル−ｔＲＮＡシンテターゼ(ａｓｐＳ){大腸菌} ｃｙｓ−ｔＲＮＡシンテターゼ(ｃｙｓＳ){大腸菌} システイニル−ｔＲＮＡ(ｓｅｒ)セレントランスフェラーゼ(ｓｅｌＡ){大腸菌} グルタミニル−ｔＲＮＡシンテターゼ(ｇｌｎＳ){大腸菌} グルタミル−ｔＲＮＡシンテターゼ(ｇｌｔＸ){大腸菌} グリシル−ｔＲＮＡシンテターゼアルファ鎖(ｇｌｙＱ){大腸菌} グリシル−ｔＲＮＡシンテターゼベータ鎖(ｇｌｙＳ){大腸菌} ヒスチジン−ｔＲＮＡシンテターゼ(ｈｉｓＳ){大腸菌} イソロイシル−ｔＲＮＡリガーゼ(ｉｌｅＳ){大腸菌} ロイシル−ｔＫＮＡシンテターゼ(ｌｅｕＳ){大腸菌} リシル−ｔＲＮＡシンテターゼ（ｌｙｓＵ){大腸菌} リシル−ｔＲＮＡシンテターゼ同族体(ｇｅｎＸ){大腸菌} メチオニル−ｔＲＮＡホルミルトランスフェラーゼ(ｆｍｔ){大腸菌} メチオニル−ｔＲＮＡシンテターゼ(ｍｅｔＧ){大腸菌} ペプチジル−ｔＲＮＡヒドロラーゼ(ｐｔｈ){大腸菌} フェニルアラニル−ｔＲＮＡシンテターゼベーターサブユニット(ｐｈｅＳ){大腸菌} フェニルアラニル−ｔＲＮＡシンテターゼペーターサブユニット(ｐｈｅＴ){大腸菌} プロリル−ｔＲＮＡシンテターゼ(ｐｒｏＳ){大腸菌} プソイドウリジル酸シンテターゼＩ(ｈｉｓＴ){大腸菌} キューオシン生合成蛋白質(ｑｕｅＡ){大腸菌} セレン代謝蛋白質(ｓｅｌＤ){大腸菌} セリル−ｔＲＮＡシンテターゼ(ｓｅｒＳ){大腸菌} スレオニル−ｔＲＮＡシンテターゼ(ｔｈｒＳ){大腸菌} 転移ＲＮＡ−グアニントランスグリコシラーゼ(ｔｇｔ){大腸菌} ｔＲＮＡ(グアニン−Ｎ１)−メチルトランスフェラーゼ(Ｍ１Ｇ−メチルトランスフェラーゼ)(ｔｒｍＤ){大腸菌} ｔＲＮＡ(ウラシル−５−)−メチルトランスフェラーゼ(ｔｒｍＡ){大腸菌} ｔＲＮＡデルタ(２)−イソベンテニルピロリン酸トランスフェラーゼ(ｔｒｐＸ) {大腸菌} ｔＲＮＡヌクレオチジルトランスフェラーゼ(ｏｃａ){大腸菌} ｔＲＮＡ−グアニン−トランスグリコシラーゼ(ｔｇｔ){大腸菌} トリプトファニル−ｔＲＮＡシンテターゼ(ｔｒｐＳ){大腸菌} チロシルｔＲＮＡシンテターゼ(ｔｙｒＳ){チオバシラス・フェロオキシダンス} バリル−ｔＲＮＡシンテターゼ(ｖａｌＳ){大腸菌} 核蛋白質ＤＮＡ結合蛋白質(推定){枯草菌} ＤＮＡ−結合蛋白質(ｒｄｇＢ){エルウィニア・カルトボーラ} ＤＮＡ−結合蛋白質Ｈ−ＮＳ(ｈｎｓ){大腸菌} ＤＮＡ−結合蛋白質ＨＵ−ＡＬＰＨＡ(ＮＳ２)(ＨＵ−２){大腸菌} 蛋白質−翻訳および修飾ジスルフィドオキシドレダクターゼ(ｐｏｒ){ヘモフィルス・インフルエンゼ} ＤＮＡ処理鎖Ａ(ｄｐｒＡ){大腸菌} 延長因子ＥＦ−ＴＳ(ｔｓｆ){大腸菌} 延長因子ＥＦ−Ｔｕ(重複)(ｔｕｆＢ){大腸菌} 延長因子ＥＦ−Ｔｕ(重複)(ｔｕｆＢ){大腸菌} 延長因子Ｇ(ｆｕｓＡ){大腸菌} 延長因子Ｐ(ｅｆｐ){大腸菌} グルタミン酸−アンモニア−リガーゼアデニリルトランスフェラーゼ(ｇｌｎＥ) {大腸菌} 開始因子３(ｉｎｆＣ){大腸菌} 開始因子ＩＦ−１(ｉｎｆＡ){大腸菌} 開始因子ＩＦ−２(ｉｎｆＢ){大腸菌} 抗生物質ＭｃｃＢ１７(ｐｍｂＡ)の成熟メチオニンアミノペプチダーゼ(地図){大腸菌} オキシド−レダクターゼ(ｄｓｂＢ){大腸菌} ペプチド鎖放出因子２(ｐｒｆＢ){ネズミチフス菌} ペプチド鎖−放出因子３(ｐｒｆＣ){大腸菌} ペプチジル−プロリルシス−トランスイソメラーゼＢ(ｐｐｉＢ){大腸掴} ポリペプチド鎖放出因子１(ｐｒｆＡ){ネズミチフス閑} ポリペプチドデホルミラーゼ(ホルミルメチオニンデホルミラーゼ)(ｄｅｆ){大腸菌} リボソーム放出因子(ｆｒｒ){大腸菌} ロタマーゼ、ペプチジルプロリルシス−トランスイソメラーゼ(ｓｌｙＤ){大腸菌} ロタマーゼ、ペプチジルプロリルシス−トランスイソメラーゼ(ｓｌｙＤ){大腸菌} 転写延長因子(ｇｒｅＡ){大腸菌} 翻訳因子(ｓｅｌＢ){大腸菌} ｘｐｒＡ蛋白質(ｘｐｒＡ){大腸菌} 蛋白質、ペプチド類、糖ペプチド類の分解アミノペプチダーゼＡ(ｐｅｐＡ){リケツチア・プロワツェキィイ} アミノペプチダーゼａ／ｉ(ｐｅｐＡ){大腸菌} アミノペプチダーゼＮ(ｐｅｐＮ){大腸菌} アミノペプチダーゼＰ(ｐｅｐＰ){大腸菌} ＡＴＰ−依存性プロテアーゼ蛋白質分解成分(ｃｌｐＰ){大腸菌} ＡＴＰ−依存性プロテアーゼＡＴＰａｓｅサブユニット(ｃｌｐＸ){大腸菌} ＡＴＰ−依存性プロテアーゼ結合サブユニット(ｃｌｐＢ){大腸菌} コラゲナーゼ活性コラゲナーゼ(ｐｒｔＣ){ポルフィロモナス・ジンジバリス} ＨＦＬＣ蛋白質(ｈｆｌＣ){大腸菌} ｌｇＡ１プロテアーゼ(ｉｇａｌ){ヘモフィルス・インフルエンゼ} ｌｇＡ１プロテアーゼ(ｉｇａｌ){ヘモフィルス・インフルエンゼ} ｌｇＡ１プロテアーゼ(ｉｇａｌ){ヘモフィルス・インフルエンゼ} ｌｏｎプロテアーゼ(ｌｏｎ){バシラス・ブレビス} オリゴペプチダーゼＡ(ｐｒｉＣ){大腸菌} ペプチダーゼＤ(ｐｅｐＤ){大腸菌} ペプチダーゼＥ(ｐｅｐＥ){大腸菌} ペプチダーゼＴ(ｐｅｐＴ){ネズミチフス菌} ペリプラズムセリンプロテアーゼＤｏおよび熱衝撃蛋白質(ｈｔｒＡ){大腸菌} 推定ＡＴＰ−依存性プロテアーゼ(ｓｍｓ){大腸菌} プロリンジペプチダーゼ(ｐｅｐＱ){大膓菌} プロテアーゼ(ｐｒｔＨ){ポルフィロモナス・ジンジバリス} プロテアーゼIV(ｓｐｐＡ){大腸菌} ファージに特異的なプロテアーゼラムダｃｌｌ抑制因子(ｈｆｌＫ){大腸菌} 推定プロテアーゼ(ｓｏｈＢ){大腸菌} シアログリコプロテアーゼ(ｇｃｐ){パスツレラ・ヘモリチカ} 輸送／結合蛋白賀アミノ酸類、ペプチド類、アミン類アルギニン輸送ＡＴＰ−結合蛋白質ａｒｔＰ(ａｒｔＰ){大腸菌} アルギニン輸送系ペルメアーゼ蛋白質(ａｒｔＭ){大腸菌} アルギニン輸送系ペルメアーゼ蛋白質(ａｒｔＱ){大腸菌} 生重合体輸送蛋白質(ｅｘｂＢ){ヘモフィルス・インフルエンゼ} 生重合体輸送蛋白質(ｅｘｂＤ){大腸菌} 分枝鎖ａａ輸送系II担体蛋白質(ｂｒａＢ){緑膿菌} Ｄ−アラニンペルメアーゼ(ｄａｇＡ){アルテロモナス・ハロブランクチス} ジペプチド輸送ＡＴＰ−結合蛋白質(ｄｐｐＤ){大腸菌} ジペプチド輸送ＡＴＰ−結合蛋白質(ｄｐｐＦ){大腸菌} ジペプチド輸送系ペルメアーゼ蛋白質(ｄｐｐＢ){大腸菌} ジペプチド輸送系ペルメアーゼ蛋白質(ｄｐｐＢ){大腸菌} ジペプチド輸送系ペルメアーゼ蛋白質(ｄｐｐＣ){大腸菌} グルタミン酸ペルメアーゼ(ｇｌｔｓ){大腸菌} グルタミン輸送系ペルメアーゼ蛋白質(ｇｌｎＰ){大腸菌} グルタミン−結合ペリプラズム蛋白質(ｇｌｎＨ){大腸菌} ロイシン−特異的輸送蛋白買(ｌｉｖＧ){大腸菌} 膜−関連成分、ＬＩＶ−II輸送系(ｂｒｎＱ){ネズミチフス菌} 糖ペプチド結合蛋白質(ｏｐｐＡ){大腸菌} 糖ペプチド結合蛋白質(ｏｐｐＡ){大腸菌} 糖ペプチド輸送ＡＴＰ−結合蛋白質（ｏｐｐＤ){ネズミチフス菌} 糖ペプチド輸送ＡＴＰ−結合蛋白質(ｏｐｐＦ){ネズミチフス菌} 糖ペプチド輸送系ペルメアーゼ蛋白質(ｏｐｐＣ)Ｃ{ネズミチフス菌} ペプチド輸送ペリプラズミ蛋白質(ｓａｐＡ){ネズミチフス菌} ペプチド輸送系ＡＴＰ−結合蛋白質(ｓａｐＤ){ネズミチフス菌} ジペプチド輸送系ペルメアーゼ蛋白質(ｄｐｐＣ){大腸菌} ペプチド輸送系ペルメアーゼ蛋白質(ｓａｐＢ){ネズミチフス菌} ペリプラズミアルギニン−結合蛋白質(ａｒｔｌ){バスツレラ・ヘモリチカ} プロトングルタミン酸共輸送蛋白質(ｇｌｔｐ){バシラス・カルドテナックス} プトレッシン輸送蛋白質(ｐｏｔＥ){大腸菌} セリン輸送因子(ｓｄａＣ){大腸菌} スペルミジン／プトレッシン輸送ＡＴＰ−結合蛋白質(ｐｏｔＡ){大腸菌} スペルミジン／プトレッシン系ペルメアーゼ蛋白質(ｐｏｔＢ){大腸菌} スペルミジン／プトレッシン系ペルメアーゼ蛋白質(ｐｏｔＣ){大腸菌} スペルミジン／プトレッシン系ペルメアーゼ蛋白質(ｐｏｔＤ){大腸菌} スペルミジン／プトレッシン系ペルメアーゼ蛋白質(ｐｏｔＤ){大腸菌} トリプトファン−特異的ペルメアーゼ(ｍｔｒ){大腸菌} チロシン−特異的輸送蛋白質(ｔｙｒＰ){大腸菌} チロシン−特異的輸送蛋白質(ｔｙｒＰ){大腸菌} カチオンパクテリオフェリチンコミグラトリ−蛋白質(ｂｃｐ){大腸菌} 第二鉄エンテロバクチン輸送ＡＴＰ−結合蛋白質(ｆｅｐＣ){大腸菌} 第二鉄エンテロバクチン輸送ＡＴＰ−結合蛋白質(ｆｅｐＣ){大腸菌} フェリクロム−鉄受容因子(ｆｈｕＡ){大腸菌} フェリチン様蛋白質(ｒｓｇＡ){大腸菌} フェリチン様蛋白質(ｒｓｇＡ){大腸菌} 鉄（III）ジクエン酸輸送ＡＴＰ−結合蛋白質ＦＥＣＥ{大腸菌} 鉄（III）ジクエン酸輸送系ペルメアーゼ蛋白質(ｆｅｃＤ){大腸菌} マグネシウムおよびコバト輸送蛋白質(ｃｏｒＡ){大腸菌} 主要第二鉄結合蛋白質蛋白質前駆体(ｆｂｐ){淋菌} 水銀輸送蛋白質(ｍｅｒＴ){緑膿菌} 水銀スカベンジャー蛋白質(ｍｅｒＰ){シュードモナス・フルオレッセンス} 水銀スカベンジャー蛋白質(ｍｅｒＰ){シュードモナス・フルオレッセンス} モリブデン酸−結合ペリプラズム蛋白質前駆体(ｍｏｄＢ){アゾトパクター・ビネランジイ} ＮＡ(＋)／Ｈ(＋)対抗輸送１(ｎｈａＡ){大腸菌} ＮＡ＋／Ｈ＋対抗輸送１(ｎｈａＢ){大腸菌} ＮＡ＋／Ｈ＋対抗輸送１(ｎｈａＣ){バシラス・フィルマス} ペリプラズム−結合−蛋白質−依存性鉄輸送蛋白質(ｓｆｕＢ){セレイシア・マルセッセンス} ペリプラズム−結合−蛋白質−依存性鉄輸送蛋白質(ｓｆｕＣ){セレイシア・マルセッセンス} カリウム流出系(ｋｅｆＣ){大腸菌} カリウム／銅−輸送ＩＮＧＡＴＰａｓｅＡ(ｃｏｐＡ){エンテロコッカス・フェカリス} ナトリウム／プロリン共輸送(プロリンペルメアーゼ)(ｐｕｔｐ){大腸菌} ｔｏｎＢ蛋白質(ｔｏｎＢ){ヘモフィルス・インフルエンゼ} ＴＲＫ系カリウム吸収蛋白質(ｔｒｋＡ){大腸菌} 炭水化物類、有機アルコール類および酸類２−オキソグルタル酸／マレイン酸転位因子(ＳＯＤｉＴ１){スピナシア・オレラセア} Ｄ−ガラクトース−結合ペリプラズム蛋白質(ｍｇｌＢ){大腸菌} Ｄ−キシロース輸送ＡＴＰ−結合蛋白質(ｘｙｌＧ){大腸菌} Ｄ−キシロース−結合ペリプラズム蛋白質(ｒｂｓＢ){大腸菌} 酵素１(ＰｔＳ１){ネズミチフス菌} 蟻酸輸送因子(蟻酸経路){大腸菌} フルクトース−ペルメアーゼIIＡ／ＦＰＲ成分(ｆｒｕＢ){大腸菌} フルクトース−ペルメアーゼIIＢＣ成分(ｆｒｕＡ){大腸菌} フコースオペロン蛋白質(ｆｕｃＵ){大腸菌} ｇｌｐＦ蛋白質(ｇｌｐＦ){大腸菌} ｇｌｐＦ蛋白質(ｇｌｐＦ){大腸菌} グルコン酸ペルメアーゼ(ｇｎｔＰ){枯草菌} グルコースホスホトランスフェラーゼ酵素III−ｇｌｃ(ｃｒｒ){大腸菌} グリセロール−３−ホスファターゼ輸送因子(ｇｌｐＴ){大腸菌} 高親和性リボース輸送蛋白質(ｒｂｓＡ){大腸菌} 高親和性リボース輸送蛋白質(ｒｂｓＣ){大腸菌} 高親和性リボース輸送蛋白質(ｒｂｓＤ){大腸菌} Ｌ−フコースペルメアーゼ(ｆｕｃＰ){大腸菌} Ｌ−乳酸ペルメアーゼ(ｌｃｔｐ){大腸菌} ラクタム利用蛋白質(ｌａｍＢ){エメリセラ・ニデュランス} ｍｇｌＡ蛋白質(ｍｇｌＡ){大腸菌} ｍｇｌＣ蛋白質(ｍｇｌＣ){大腸菌} ペリプラズムリボース−結合蛋白質(ｒｂｓＢ){大腸菌} ホスホヒスチジノ蛋白質−ヘキソースホスホトランスフェラーゼ(ｐｔｓＨ){大腸菌} カリウム経路同族体(ｋｃｈ){大腸菌} 推定アスパラギン酸輸送蛋白質(ｄｃｕＡ){大腸菌} 推定アスパラギン酸輸送蛋白質(ｄｃｕＡ){大腸菌} リボース輸送ペルメアーゼ蛋白質(ｘｙｌＨ){大腸菌} ナトリウム−および塩化物−依存性ＧＡＢＡ輸送{ホモ・サビエンス} ナトリウム−依存性ノラドレナリン輸送{ホモ・サビエンス} ヌクレオシド類、プリン類およびピリミジン類リボヌクレオチド輸送ＡＴＰ−結合蛋白質(ｍｋｌ){癩菌} ウラシルペルメアーゼ(ｕｒａＡ){大腸菌} アニオンシステインシンテターゼ(ｃｙｓＺ){大腸菌} 親水性の膜−結合された蛋白質(ｍｏｄＣ){大腸菌} 疎水性の膜−結合された蛋白質(ｍｏｄＢ){大腸菌} 組み込み膜蛋白質(ｐｓｔＡ){大腸菌} 硝酸輸送ＡＴＰａｓｅ成分(ｎａｓＤ){肺炎棹菌} 周囲膜蛋白質Ｂ(ｐｓｔＢ){大腸菌} 周囲膜蛋白質ｃ(ｐｓｔＣ){大腸菌} ペリプラズムリン酸−結合蛋白質(ｐｓｔＳ){大腸菌} ペリプラズムリン酸−結合蛋白質(ｐｓｔＳ){大腸菌} リン酸ペルメアーゼ(ＹＢＲ２９６Ｃ){サッカロミセス・セレビシエ} その他ＡＴＰ依存性転位因子同族体(ｍｓｂＡ){ヘモフィルス・インフルエンゼ} ＡＴＰ−結合蛋白質(ａｂｃ){大腸菌} 膿胞性繊維症トランスメンブランコンダクタンス調節因子{ボス・タウラス} ヘム−結合リボ蛋白質(ｄｐｐＡ){ヘモフィルス・インフルエンゼ} ヘム−ヘモペキシン−結合蛋白質(ｈｘｕＡ){ヘモフィルス・インフルエンゼ} ヘミンペルメアーゼ(ｈｅｍＵ){エルニシア・エンテロコリチカ} 高親和性コリン輸送蛋白質(ｂｅｔＴ){大腸菌} ラクトフェリン結合蛋白質(ｌｂｐＡ){髄膜炎菌} Ｎａ＋／硫酸共輸送因子{ラッツス・ノルベギカス} パントテン酸ペルメアーゼ(ｐａｎＦ){大腸菌} トランスフェリン結合蛋白質１前駆体(ｔｂｐ１){髄膜炎菌} トランスフェリン結合蛋白質１前駆体(ｔｂｐ１){髄膜炎菌} トランスフェリン結合蛋白質１前駆体(ｔｂｐ１){髄膜炎菌} トランスフェリン結合蛋白質２前駆体(ｔｂｐ２){髄膜炎菌} トランスフェリン−結合蛋白質(ｔｆｂＡ){アクチノバシラス・プレウロニューモニエ} トランスフェリン−結合蛋白質１(ｔｂｐ１){髄膜炎菌} トランスフェリン−結合蛋白質１(ｔｂｐ２){髄膜炎菌} 輸送ＡＴＰ−結合蛋白質(ｃｙｄＤ){大腸菌} 輸送ＡＴＰ−結合蛋白質(ｃｙｄＤ){大腸菌} 細胞処理チャペロン類チャペロニン(ｇｒｏＥＳ)(ｍｏｐＢ){大腸菌} 熱衝撃蛋白質(ｇｒｏＥＬ)(ｍｏｐＡ){ヘモフィルス・ジュクレイ} 熱衡撃蛋白質(ｄｎａＪ){大腸菌} 熱蛋白質ｃ６２.５(ｈｔｐＧ){大腸菌} ｈｓｃ６６蛋白質(ｈｓｃ６６){大腸菌} ｈｓｐ７０蛋白質(ｄｎａＫ){大腸菌} 細胞分裂細胞分裂ＡＴＰ−結合蛋白質(ｆｔｓＥ){大腸菌} 細胞分裂抑制因子(ｓｕｌＡ){ビブリオ・コレレ} 細胞分裂蛋白質(ｆｔｓＡ){大腸菌} 細胞分裂蛋白質(ｆｔｓＨ){大腸菌} 細胞分裂蛋白質(ｆｔｓＨ){大腸菌} 細胞分裂蛋白質(ｆｔｓＪ){大腸菌} 細胞分裂蛋白質(ｆｔｓＬ){大腸菌} 細胞分裂蛋白質(ｆｔｓＱ){大腸菌} 細胞分裂蛋白質(ｆｔｓＷ){大腸菌} 細胞分裂蛋白質(ｆｔｓＹ){大腸菌} 細胞分裂蛋白質(ｆｔｓＺ){大腸菌} 細胞分裂蛋白質(ｍｕｋＢ){大腸菌} 細胞質軸フィラメント蛋白質(ｃａｆＡ){大腸菌} ｆｔｓＸ蛋白質(ｆｔｓＸ){大腸菌} ｍｕｋＢ抑制因子蛋白質(ｓｍｂＡ){大腸菌} ペニシリン−結合蛋白質３(ｆｔｓｌ){大腸菌} 蛋白質、ペプチド分泌ＧＴＰ−結合膜蛋白質(ｌｅｐＡ){大腸菌} コリシンＶ分泌ＡＴＰ−結合蛋白質(ｃｖａＢ){大腸菌} リボ蛋白質シグナルペプチダーゼ(ｌｓｐＡ){大腸菌} ペプチド輸送系ＡＴＰ−結合蛋白質ＳＡＰＦ(ｓａｐＦ){大腸菌} プレ蛋白質トランスロカーゼ(ｓｅｃＥ){大腸菌} プレ蛋白質トランスロカーゼＳＥＣＹサブユニット(ｓｅｃＹ){大腸菌} 蛋白質−輸送膜蛋白質(ｓｅｃＤ){大腸菌} 蛋白質−輸送膜蛋白質(ｓｅｃＦ){大腸菌} 蛋白質−輸送膜蛋白質(ｓｅｃＧ){大腸菌} 蛋白質−輪送蛋白質(ｓｅｃＢ){大腸菌} ｓｅｃＡ蛋白質(ｓｅｃＡ){大腸菌} シグナルペプチダーゼＩ(ｌｅｐＢ){大腸菌} シグナル認識粒子蛋白質(５４同族体)(ｆｆｈ){大腸菌} 始動因子(ｔｉｇ){大腸菌} ４型プレピリン−様蛋白質特異的リーダーペプチダーゼ(ｈｏｐＤ){大腸菌} ｘｏｐＳ蛋白質(ｘｃｐＳ){シュードモナス・ビュチダ} 解毒ＫＷ２０カタラーゼ(ｈｋｔＥ){ヘモフィルス・インフルエンゼ} スーパーオキシドジスムターゼ(ｓｏｄＡ){ヘモフィルス・インフルエンゼ} チオフェンおよびフラン酸化蛋白質(ｔｈｄＦ){大腸菌} 細胞死滅ヘモリシン(ｔｌｙｃ){セルプリナ・ヒオジセンテリエ} ヘモリシン、２１ｋＤａ(ｈｌｙ){アクチノバシラス・プロウロニューモニエ} 死滅蛋白質(ｋｉｃＡ){大腸菌} 死滅蛋白質抑制因子(ｋｉｃＢ){大腸菌} ロイコトキシン分泌ＡＴＰ−結合蛋白質(ｌｋｔＢ){アクチノバシラス・アクチノミセテンコミタンス} 転換ｃｏｍ１０１Ａ蛋白質(ｃｏｍＦ){ヘモフィルス・インフルエンゼ} 反応遺伝子座Ｅ(ｃｏｍＥ１){枯草菌} ｔｆｏＸ蛋白質(ｔｆｏＸ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿１)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿１０)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿２)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿３)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿４)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿５)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿６)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿７)(ｃｏｍ){ヘモフィルス・インフルエンゼ} 他の階級コリシン−関連機能コリシン耐性蛋白質(ｔｏｌＢ){大腸菌} コリシンＶ生産蛋白質(ｐｕｒレグロン)(ｃｖｐＡ){大腸菌} 内膜蛋白質(ｔｏｌＱ){大腸菌} 内膜蛋白質(ｔｏｌＲ){大腸菌} 外膜組み込み蛋白質(ｔｏｌＡ){大腸菌} 外膜組み込み蛋白質(ｔｏｌＡ){大腸菌} ファージー関連機能およびブロファージ類Ｅ１６蛋白質(ｍｕＥ１６){バクテリオファージｍｕ} Ｇ蛋白質(ｍｕＧ){バクテリオファージｍｕ} Ｇ蛋白質(ｍｕＧ){バクテリオファージｍｕ} ｇａｍ蛋白質{バクテリオファージｍｕ} 熱衝撃蛋白質Ｂ２５３(ｇｒｐＥ){大腸菌} 宿主因子−ｌ(ＨＦ−ｌ)(ｈｆｑ){大腸菌} Ｉ蛋白質(ｍｕｌ){バクテリオファージｍｕ} ＭｕＢ蛋白質(ｍｕＢ){バクテリオファージｍｕ} Ｎ蛋白質(ｍｕＮ){バクテリオファージｍｕ} Ｐ蛋白質{バクテリオファージｍｕ} ターミナーゼサブユニット１{バクテリオファージＳＦ６} トランスポザーゼＡ(ｍｕＡ){バクテリオファージｍｕ} トランスポゾン−関連機能挿入配列ＩＳ１０１６(Ｖ−４)仮定蛋白質(ＧＢ：Ｘ５８１７６＿２){ヘモフィルス・インフルエンゼ} ＩＳ１０１６−Ｖ６蛋白質(ＩＳ１０１６−Ｖ６){ヘモフィルス・インフルエンゼ} ＩＳ１０１６−Ｖ６蛋白質(ＩＳ１０１６−Ｖ６){ヘモフィルス・インフルエンゼ} ＩＳ１０１６−Ｖ６蛋白質(ＩＳ１０１６−Ｖ６){ヘモフィルス・インフルエンゼ} 薬品／同族体感受性アクリフラビン耐性蛋白質(ａｃｒＢ){大腸菌} ａｍｐＤシグナル用蛋白質(ａｍｐＤ){大腸菌} ビシクロマイシン耐性蛋白質(ｂｏｒ){大腸菌} 水銀耐性調節蛋白質(ｍｅｒＲ２){チオバシラス・フェロオキダンス} 薬品活性の調節因子(ｍｄａ６６){大腸菌} 複合薬品耐性蛋白質(ｅｍｒＢ){大腸菌} 複合薬品耐性蛋白質(ｅｍｒＡ){大腸菌} 複合薬品耐性蛋白質(ｍｄｌ){大腸菌} 結節蛋白質Ｔ(ｎｏｄＴ){リゾビウム・レグミノサラム} ｒＲＮＡ(アデノシン−Ｎ６,Ｎ６−)−ジメチルトランスフェラーゼ(ｋｓｇＡ){ 大腸菌} テルル酸耐性蛋白質(ｔｅｈＡ){大腸菌} テルル酸耐性蛋白質(ｔｅｈＢ){大腸菌} 放射感受性ｒａｄＣ蛋白質(ｒａｄＣ){大腸菌} 適応、非定型条件自家成長蛋白質(ａｕｔ){アルカリゲネス・ユートロファス} 熱衝撃蛋白質(ｈｔｐＸ){大腸菌} 熱衝撃蛋白質Ｂ(ｉｂｐＢ){大腸菌} ｈｔｒＡ−様蛋白質(ｈｔｒＨ){大腸菌} 侵入蛋白質(ｉｎｖＡ){バルトネラ・バシリホルミス} ＮＡＤ（Ｐ）Ｈ：メナジオンオキシドレダクターゼ{ムス・ムスクラス} 生存蛋白質(ｓｕｒＡ){大腸菌} ｕｓｐＡ蛋白質(ｕｓｐＡ){大腸菌} ビルレンスプラスミド蛋白質(ｖａｇＣ){サルモネラ・ダブリン} ビルレンス関連蛋白質Ａ(ｖａｐＡ){ジケロバクター・ノドサス} ビルレンス関連蛋白賀Ｃ(ｖａｐＣ){ジケロバクター・ノドサス} ビルレンス関連蛋白質Ｃ(ｖａｐＣ){ジケロバクター・ノドサス} ビルレンス関連蛋白質Ｄ(ｖａｐＤ){ジケロバクター・ノドサス} ビルレンスプラスミド蛋白質(ｍｌｇＡ){シェワネラ・コルワリアナ} 未同定１５ｋＤａ蛋白質(Ｐ１５){大腸菌} ２−ヒドロキシ酸デヒドロゲナーゼス同族体(ｄｄｈ){ジモモナス・モビリス} ベータ−ラクタマーゼ調節同族体(ｍａｚＧ){大腸菌} 接合伝達補抑制因子(ｆｉｎＯ){大腸菌} デルタ−１−ピロリン−５−カルボキシル酸レダクターゼ(ｐｒｏＣ){緑膿菌} ｄｅｖＡ蛋白質(ｄｅｖＡ){アナベナ種} ｄｅｖＢ蛋白質(ｄｅｖＢ){アナベナ種} 胚富裕蛋白質、３群{トリチカム・エスチバム} 遺伝子外抑制因子(ｓｕｈＢ){大腸菌} ＧＣＰＥ蛋白質(蛋白質Ｅ)(ｇｐｃＥ){大腸菌} ＧｅｒＣ２蛋白質(ｇｅｒＣ２){枯草菌} ｇｌｐＸ蛋白質(ｇｌｐＸ){大腸菌} グリオキシル酸−誘導蛋白質{大腸菌} ｈｓｌＵ蛋白質(ｈｓｌＵ){大腸菌} ｈｓｌＶ蛋白質(ｈｓｌＶ){大腸菌} ｉｌｖ−関連蛋白質{大腸菌} イソコリスミ酸シンターゼ(ｅｎｔＣ){枯草菌} 膜関連ＡＴＲａｓｅ(ｃｂｉＯ){プロピオニバクテリウム・ロイデンレイチイ} 膜蛋白質(ｌａｐＢ){バステウレラ・ヘモリチカ} 膜蛋白質(ｌａｐＢ){バステウレラ・ヘモリチカ} Ｎ−カルバミル−Ｌ−アミノ酸アミドヒドロラーゼ{バシラス・ステアロサーモフィラス} 窒素固定蛋白質(ｎｉｆＳ){アナベナ種} 窒素固定蛋白質(ｎｉｆＳ){ミコバクテリウム・レプレ} 窒素固定蛋白質(ｎｉｆＳ){ミコバクテリウム・レプレ} 窒素固定蛋白質(ｎｉｆＵ){肺炎棹菌} 窒素固定蛋白質(ｎｎｆＥ){ロードバクター・カプサラタス} 窒素固定蛋白質(ｎｎｆＥ){ロードバクター・カプサラタス} ナイトロゲナーゼＣ(ｎｉｆＣ){クロスチリジウム・パステウリアナム} ナイトロゲナーゼＣ(ｎｉｆＣ){クロスチリジウム・パステウリアナム} ｎｍｔ１蛋白質(ｎｍｔ１){アスペルギルス・パステウリアナム} 分配系蛋白質(ｐａｒＢ){プラスミドＲＰ４} ｒａｒＤ蛋白質(ｒａｒＤ){大腸菌} ｒａｒＤ蛋白質(ｒａｒＤ){大腸菌} ｓｋｐ蛋白質(ｓｋｐ){パステウレラ・マルトシダ} 小蛋白質(ｓｍｐＢ){大腸菌} ｓｐｏｌｌｌＥ蛋白質(ｓｐｏｌｌｌＥ){コキシエラ・プメチイ} 抑制因子蛋白質(ｍｓｇＡ){大腸菌} サーファクチン(ｓｆｐｏ){枯草菌} ｔｏｘＲレグロン(ｔａｇＤ){ビブリオ・コレレ} ｔｒａＮ蛋白質(ｔｒａＮ){プラスミドＲＰ４} 輸送ＡＴＰ−結合蛋白質(ｃｙｄＣ){大腸菌} 輸送ＡＴＰ−結合蛋白質(ｃｙｄＣ){大腸菌} ｖａｎＨ蛋白質(ｖａｎＨ){トランスポソンＴｎ１５４６} 粘液状態遺伝子座蛋白質(ｍｕｃＢ){緑膿菌} フェノールヒドロキシラーゼ(ＯＲＦ６){アシネトバクター・カルコアセチカス} プラズマプロテアーゼＣ１抑制因子{ホモ・サピエンス} 既知ＡＴＰ依存性転位同族体(ｍｓｂＡ) 外膜蛋白質Ｐ２(ｏｍｐＰ２) 一本鎖ＤＮＡ結合蛋白質(ｓｓｂ) ｔｏｎＢ蛋白質(ｔｏｎＢ) ヘム−ヘモペキシン−結合蛋白質(ｈｘｕＡ) アデニル酸キナーゼ(ＡＴＰ−ＡＭＰトランスホスホリラーゼ)(ａｄｋ) 仮定蛋白質(ＳＰ：Ｐ２４３２６) ｕｄｐ−グルコース４−エピメラーゼ(ガラクトワルデナーゼ)(ｇａｌＥ) 仮定蛋白質(ＳＰ：Ｐ２４３２４) ＰＣ蛋白質(１５ｋｄペプチドグリカン−関連外膜リボ蛋白質)(ｐａｌ) 外膜蛋白質Ｐ１(ｏｍｐＰ１) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿７)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿６)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿５)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿４)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿３)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿２)(ｃｏｍ) 転換遺伝子群仮定蛋白質(ＧＢ：Ｍ６２８０９＿１)(ｃｏｍ) Ｈｉｎｏｌｌエンドヌクレアーゼ(Ｈｉｎｃｌｌ) 修飾メチラーゼＨｉｎｃｌｌ(ｈｉｎｃｌｌＭ) リポオリゴ糖生合成蛋白質ストレプトマイシン耐性蛋白質(ｓｔｒＡ) リコンビナーゼ(ｒｅｃＡ) ｔｆｏＸ蛋白質(ｔｆｏＸ) アデニル酸シクラーゼ(ｃｙａＡ) ２８ｋＤａ膜蛋白質(ｈｌｐＡ) 蛋白質Ｄ(ｈｐｄ) リボ蛋白質(ｈｅｌ) アルドース１−エピメラーゼ前駆体(ムタロターゼ)(ｍｒｏ) ガラクトキナーゼ(ｇａｌＫ) ガラクトース−１−リン酸ウリジリルトランスフェラーゼ(ｇａｌＴ) ガラクトースオペロン抑制因子(ｇａｌＳ) 仮定蛋白質(ＧＢ：Ｍ９４２０５＿１) ジスルフィドオキシドレダクターゼ(ｐｏｒ) ヘム−結合リボ蛋白質(ｄｐｐＡ) 保護表面抗原Ｄ１５ＫＷ２０カタラーゼ(ｈｋｔＥ) 環式ＡＭＰ受容因子蛋白質(ｃｒｐ) スーパーオキシドジスムターゼ(ｓｏｄＡ) 外膜蛋白質Ｐ５(ｏｍｐＡ) ＤＮＡヘリカーゼII(ｕｖｒＤ) Ｈｉｎｄｌｌｌ修飾メチルトランスフェラーゼ(ｈｉｎｄｌｌｌＭ) Ｈｉｎｄｌｌｌ制限エンドヌクレアーゼ(ｈｉｎｄｌｌｌＲ) ＤＮＡポリメラーゼIII、ｃｈｉサブユニット(ｈｏｌＣ) ｌｉｃ−１オペロン蛋白質(ｌｉｃＣ) ｌｉｃ−１オペロン蛋白質(ｌｉｃＤ) １５ｋｄペプチドグリカン−関連リボ蛋白質(１ｐｐ) ホルミルテトラヒドロ葉酸ヒドロラーゼ(ｐｕｒＵ) エノールピルビルシキム酸リン酸シンターゼ(ａｒｏＡ) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿８) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿７) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿６) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿５) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿４) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿３) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿２) ｌｓｇ遺伝子座仮定蛋白質(ＧＢ：Ｍ９４８５５＿１） DETAILED DESCRIPTION OF THE INVENTION The nucleotide sequence of the Rd genome of Haemophilus influenzae, The fragment and its use Some of the work performed during the development of the present invention used US government funding. the government The present invention may have certain rights. NIH-5R01GM48251 Field of the invention The present invention relates to the field of molecular biology. The present invention relates to H. influenzae ( Haemophilus influenzae) nucleotide sequence, including fragments thereof Disclosed are products and uses in industrial fermentation and pharmaceutical development. Background of the Invention The complete genomic sequence of freely living cell organisms has not yet been determined. the first The Mycobacterium sequence was completed by 1996, while E. coli and S. cerevisiae were 19 It is expected to be completed by 98 years ago. These are overlap cosmid claw This is done by random and / or command sequencing. Random shotgun Attempts to determine sequences on the order of one megabase or more by the method No one was there. Haemophilus influenzae is a small (approximately 0.4 × 1 Ron) Non-automatic, non-sporulating, embryo-negative bacteria. This is the upper part of children and adults Inhabits the respiratory mucosa and causes otitis media and respiratory tract infections in most children You. The most serious complication is meningitis, which is neurological in up to 50% of infected children Results in a clinical sequelae. Based on immunologically distinct capsular polysaccharide antigens, Fluenza serotypes (a to f) have been identified. Many strains whose type cannot be determined Is also known. Serotype b is responsible for the majority of human diseases. Interest in the medically important features of Haemophilus influenzae biology is The focus is on the genes that determine the virulence characteristics of E. coli. Contributes to capsular polysaccharide Maps of a number of genes that have been generated and sequenced (Kroll et al., Mol . Microbiol.Five(6); 1549-1560 (1991). Some outer membrane proteins (OMP ) The gene was identified and sequenced (Langford et al., J. Gen. Microbi). l.138: 155-159 (1992)). Lipooligo bran (LOS) component of outer layer membrane and its combination Genes of the synthetic pathway have been intensively studied (Weiser et al., J. Bacteriol.172: 33 04-3309 (1990)). Although vaccines have been available since 1984, Studies have been motivated to some extent by the demand for improved vaccines. Recently Catalase gene is characterized as a potential virulence-related gene The sequence was determined (Bishni et al., Printing). Elucidation of H. influenzae genome How Influenza Bacteria Cause Invasive Diseases and How It will increase your understanding of how best to fight infection. Haemophilus influenzae has a very efficient natural DNA transformation system and Leverage systems have been intensively studied with unencapsulated (R), serotype d strains (Kahn and Smith). J. Membrans Biology81: 89-103 (1984)). At least 16 transformants Alien genes have been identified and sequenced. Of these, four are regulated Gene (Redfield. J. Bacteriol.,173: 5612-5618 (1991) and Ch andler, Proc. Natl. Acad. Sci. USA89: 1626-1630 (1992)) At least two are involved in the recombination process (Barouki and Smith. J. Baoteriol. ,163(2): 629-634 (1985)), and at least seven toward membrane and periplasmic space (Tomb et al., Gene104: 1-10 (1991) and Tomb. Proc. Natl. A cad. Sci. USA89: 10252-10256 (1992)), and where these genes Seems to function as a structural component or within the assembly of the DNA transport machinery You. Transformation of Haemophilus influenzae Rd is based on front row specific DNA incorporation, Membrane, called the transformom, of several double-stranded DNA molecules per cell Rapid uptake into the chamber, linear translocation of single-stranded donor DNA into the cytoplasm Application and pairing and pairing of single strand and chromosome by single strand displacement mechanism Including replacement It shows a number of interesting features. Haemophilus influenzae Rd transformation system is gram shade The most thoroughly studied sexual system, and the Gram-positive system Is different. The size of the Rd genome of Haemophilus influenzae is limited by a pulsed-field agaro It was determined to be approximately 1.9 Mb by gel electrophoresis, and this genome It is approximately 40% of the size of E. coli (Lee and Smith, J. Bacteriol.170: 4402 4054405 (1988)). The restriction map for Haemophilus influenzae is circular (Lee et al., J. Mol. Bacteriol.1713016-3024 (1989) and Redfield and Lee, "Influence Rd ", 2110-2112, O'Brien, SJ (ed.), Ganetic Maps: Loc. us Maps of Complexes Genomex, Cold Spring Harbor Press, New York H). Probes for Southern hybridization of various genes with restricted sinus DNA bands Mapped to restriction fragments by the method. This map is randomly distributed Demonstrates complete genomic sequence assembly from sequenced fragments It would be worth explaining. GenBank currently has about 100kb of redundancy There are no Haemophilus influenzae DNA sequences. About half are from serotype b And half from Rd Summary of the Invention The invention is based on the sequencing of the Rd genome of Haemophilus influenzae. One obtained The following nucleotide sequence is provided in SEQ ID NO: 1. The present invention relates to the obtained nucleotide sequence of the Rd genome or its substitutes. Forms in which tabular fragments can be easily used, analyzed and interpreted by skilled technicians To provide. In one embodiment, the present invention relates to a nucleic acid sequence represented by SEQ ID NO: 1. It is provided as a continuous sequence of primary sequence information corresponding to the otide sequence. The invention further provides at least 99.9% identity to the nucleotide sequence of SEQ ID NO: 1. Provides the nucleotide sequence of SEQ ID NO: 1 nucleotide sequence, a representative fragment or sequence identifier thereof A nucleotide sequence that is at least 99.9% identical to the nucleotide sequence of Alternative Number 1: Can be provided in a variety of media that facilitate the use of This embodiment In one such application, the sequences of the invention are described on a computer readable medium. . Such media include magnetic storage media such as floppy disks, hard disks, etc. Disk storage medium and magnetic chip; optical storage medium such as CD-ROM; RAM and Electrical storage media such as ROMs; and composites of these categories, for example magnetic / Optical storage media include, but are not limited to. The invention further relates to a system, in particular a data storage means, stored and described herein. A computer-based system having sequence information provided. This The system integrates commercially important fragments of the Rd genome. It is designed to Another embodiment of the present invention relates to an isolated fragment of the Rd genome of Haemophilus influenzae. Are turned to The fragments of the Rd genome of the Haemophilus influenzae of the present invention include , Fragment encoding peptide (hereinafter referred to as open reading frame (ORF)), operable A fragment that regulates the expression of an operatively linked ORF (hereinafter, expression regulation flag) (EMF)), a flag that mediates the uptake of bound DNA fragments into cells (Hereinafter referred to as uptake regulatory fragment (UMF)) and influenza in the sample. A fragment that can be used to diagnose the presence of Rd. (DF)), but is not limited thereto. Each ORF fragment of the Haemophilus influenzae Rd genome is disclosed in Tables 1 (a) and 2 And the EMF found 5 'of the ORF is an enormous polynucleotide reagent. It can be used in a number of ways. These sequences indicate the presence of a particular microorganism in the sample. As a diagnostic probe or diagnostic amplification primer for Use to manufacture products and to selectively control gene expression Can be. The invention further includes one or more of the Rd genomes of Haemophilus influenzae of the invention. And recombinant constructs containing the fragment. The recombinant construct of the present invention A vector into which a fragment of Haemophilus influenzae Rd is inserted, for example, Contains a sumid or viral vector. The present invention further provides for the isolation of isolated fragments of the Rd genome of Haemophilus influenzae of the present invention. A host cell containing one of the following is provided. These host cells are Can be a higher eukaryotic host, such as a vesicle, or a lower eukaryotic cell, such as a yeast cell. Or prokaryotic cells such as bacterial cells. The present invention is further directed to an isolated protein encoded by the ORF of the present invention. ing. The protein of the present invention may be prepared using various oils known in the art. Can be obtained. At the simplest level, it is commercially available Amino acid sequences can be synthesized using any suitable peptide synthesizer. Alternative In the method, the protein is purified from bacterial cells that naturally produce the protein. Most Later, the protein of the present invention has been modified to express the desired protein Alternatively, it can be purified from cells. The present invention further provides a homologue of a fragment of Rd geno 4 of the Haemophilus influenzae of the present invention. And a method for obtaining a homolog of the protein encoded by the ORF of the present invention. provide. Specifically, the nucleotide and amino acid sequences disclosed herein are Use as lobes or primers and perform PCR cloning By using techniques such as Lark hybridization, Can obtain homologs. The invention further provides an antibody that selectively binds to one of the proteins of the invention. . Such antibodies include both monoclonal and polyclonal antibodies. You. The present invention further provides a hybridoma that produces the above antibody. Hybrid Ma is an immortalized cell line that can secrete specific monoclonal antibodies. The present invention further relates to cells derived from cells expressing one of the ORFs of the present invention or homologs thereof. The present invention provides a method for identifying a test sample. Such a method uses a test sample of the present invention. Co-administered with one or more antibodies or one or more DFs of the invention Skilled in the art to determine whether the sample contains an ORF or a product produced therefrom. Incubating under conditions that can be measured by a person. In another embodiment of the present invention, the reagents required to perform the above assay A kit is provided. In particular, the present invention comprises (a) one of the antibodies of the present invention or one of the DFs A first container having; and (b) one or more of the following reagents: One or more reagents which can detect the presence of the antibody or hybridized DF. Close one or more containers, including more other containers, Provide a compartment kit for containment and acceptance. Using the isolated protein of the present invention, the present invention further provides one of the ORFs of the present invention. Method for obtaining and identifying substances capable of binding to the encoded protein I will provide a. In particular, such substances include antibodies (described above), peptides, carbohydrates Goods, pharmaceuticals, etc. are included. Such a method involves the following steps: (A) the substance is encoded by one of the ORFs of the invention Contacting with the isolated protein; and (B) Determine whether the substance binds to the protein To do. The complete genome sequence of Haemophilus influenzae is available for all studies It would be very valuable for the room and for a variety of commercial purposes. Haemophilus influenzae R d Many fragments of the genome can be stored in genebanks or protein databases. Immediately identified by similarity search, and for the time being Haemophilus researchers Valuable and immediate commercial for protein production or control of gene expression Worth it. A particular example concerns PHA synthase. Po Rehydroxybutyrate is present in the membrane of Haemophilus influenzae Rd and its Amounts have been reported to correlate with the level of competence required for transformation. this PHA synthase that synthesizes polymers has been identified and sequenced in a number of bacteria However, none of these bacteria are evolutionarily close to H. influenzae. this The gene is used in influenza using hybridization probes or PCR technology. It must still be isolated from the fungus. However, the genomic sequence of the present invention is as follows: The use of the described search means allows the identification of this gene. The development of methodologies and technologies to elucidate the entire genome sequence of bacteria and other small genomes The ability and understanding to analyze chromosome tissue is greatly enhanced and enhanced Will be. In particular, the sequenced genome is located within a large fragment of genomic DNA. Ability to identify genes, structure, location and spacing of regulatory elements, potential for industrial application Identify genes with and perform comparative genomics and molecular phylogeny Provides models to develop tools for analyzing chromosome structure and function, including Will do. Description of the drawings Figure 1-Restriction map of the Haemophilus influenzae Rd genome. FIG. 2-Can be used to implement the computer-based system of the present invention. FIG. Figure 3-2.5 Mb genome (triangle) with an average sequence length of 460 bp and 25 bp overlap. And 1.6Mb genome (circle) for Lander-Waterman Autoassembler (AtoAssembler) (square) compared to the prediction of n) Comparison of experimental ranges of up to approximately 4000 random sequence fragments. Figure 4-Manage, assemble, edit and annotate the H. influenzae genome Data flow and computer program used to remove data. Macintosh A using both Macintosh and Unix platforms B373 sequence data file (Kerlavage et al., Proceedings of the Twenty-Sixt h Annual Hawaii International Conference on System Sciences, IE EE Computer Society Press, Washington DC, 585 (1993)) You. Factura (AB) for automatic vector sequence removal of sequence files And a Macintosh program designed for end trimming. Program Mu esp runs on the Macintosh platform and is Influenza based on Unix based feature data extracted from sequence files Write to the fungus-related database. Assembling, stp, X-window graphics Interface and user specified or standard SQL queries Control program that can be used to retrieve sequences from the Haemophilus influenzae database Identify using program Achieved by retrieving a set of sequence files and their related features . Sequence files are used to assemble thousands of sequence fragments quickly and accurately Assembled using TIGR assembler, TIGR designed assembly equipment Was set up. TIGR Editor (Editor) is TIGR assembler out Write an aligned sequence file from the put and use this alignment for continuous editing. Graphical interface capable of displaying columns and associated electropherograms Face. Identification of the putative coding region was determined by Genemark ( Borodovsky and McIninch, Computers Chem.17(2): 123 (1993)) Markov and Bayes modeling professionals to predict gene locations Gram and directed to a Haemophilus influenzae sequence dataset. peptide Search is Maspar MP-2 large with 4096 microprocessors Blaze (Brutag et al., Computers Chem) run on a large scale parallel computer .17: 203 (1993)) and 3 of each coding region predicted by Genemark Performed on one reading frame. The result obtained from each reading frame is calculated by mblzt. Combined in a single output file. Optimal protein alignment is latent Acquired using the program praze to extend the alignment between potential frameshifts Was. The output is an order that directly interacts with the Haemophilus influenzae database. -I checked it using gbyob, a graphic display program of Made. these Alignment is used to identify potential frameshift errors and further editing Goals and goals. Figure 5-Location and geno location of each predicted coding region that has a match with the database Circular representation of the Rd chromosome of the Haemophilus influenzae showing selected general characteristics of the system . Outer circumference: unique NotI restriction site (designated nucleotide 1), RarII site And the location of the SmaI site. Outer concentric circle: For each identified coding region for which the gene was identified position. The position of each coding area is coded with respect to its role according to the color code in FIG. ing. Second concentric circle: high G / C content area (> 42%, red;> 40%, blue ) And regions with high A / T content (> 66%, black;> 64%, green). High G / C content The quantity region consists of six ribosomal operons and mu-like pro Of particular relevance to phages. Third concentric circle: range by lambda clone (blue). More than 300 lambda clones are sequenced from each end to provide a complete genome structure. Confirmed and identified six ribosomal operons. Fourth Concentric Circle: Six Revo Sosome operon (green), tRNA (black) and ambiguous mu-like prophage (blue) Color) position. Fifth concentric circle: simple line-up of repetitions. Indicates the position of the next iteration Are: CTGGCT, GTCT, ATT, AATGGC, TIGA, TT GG, TTTA, TTATC, TGAC, TCGTC, AACC, TTGG, C AAT, CCAA. Putative origin of replication is directed outwards starting around base 603,000 Indicated by the pointing arrow (green 6). The two potential termination sequences are on opposite sides of the circle (Red) near the midpoint of. Figures 6 (A) -6 (D)-Complete maps of Haemophilus influenzae Rd genome. prediction The coding region is indicated on each chain. The rRNA and tRNA genes are linear and Shown as triangles. Genes are coded by color according to the role category described in the legend Has been Gene identification (GeneID) numbers are the numbers in Tables 1 (a), 1 (b) and 2. Corresponding to Where possible, three letter notation is also provided. Figure 7-Eight members of the bushy hem gene population present in Haemophilus influenzae type b The region of the chromosome of Haemophilus influenzae containing the gene and the same in the Rd of Haemophilus influenzae Comparison with area. In this region, the pepN and purE genes are flanked by both organisms. ing. However, in the non-infectious Rd strain, the eight genes The child has been cut off. In this region of the Rd strain, a 172 bp spacer region is located. And the pepN and purE genes are flanked. Figure 8-Hydrophobicity analysis of five predicted channel proteins. Known peptide sequence 5 predicted coding regions that show no homology to (87 GenBank releases) Amino acid sequence, each amino acid sequence is characteristic of a channel-forming protein Are shown. The sequence of the predicted coding region is Uses the WorksWorks software package (Intelligenetics) The Kite-Doolittle algorithm (Kyte and Dooolit) tle, J.M. Mol. Biol.157: 105 (1982)) (with a range of 11 residues). Detailed Description of the Preferred Embodiment The invention is based on the sequencing of the Rd genome of Haemophilus influenzae. One obtained The following nucleotide sequence is provided in SEQ ID NO: 1. As used herein The term “primary front row” refers to a nucleotide sequence represented by IUPAC nomenclature. . The sequence provided in SEQ ID NO: 1 is found in the Rd genome of Haemophilus influenzae Adapted in relation to unique NotI restriction endonuclease sites. Skilled skills Said that this start / stop point was chosen on delegation and reflects structural significance You will easily recognize that you have not. The present invention relates to the nucleotide sequence of SEQ ID NO: 1 or a representative fragment thereof. Is provided in a form that can be easily used, analyzed, and interpreted by a skilled technician. One In an embodiment, the sequence corresponds to the nucleotide sequence provided in SEQ ID NO: 1. Is provided as a continuous sequence of primary sequence information. As used herein, "the nucleotide sequence represented by SEQ ID NO: 1" "Representative fragments" are those that are currently listed in publicly available databases. No sequence identification number: refers to any part of 1. Preferred representative fragment of the present invention Reading frame, expression control fragment, uptake control flag That can be used to diagnose the presence of R. influenzae in the It is a fragment. The identification of such preferred representative fragments is shown in Table 1 (a ) And 2, but are not limited thereto. The nucleotide sequence information provided in SEQ ID NO: 1 is based on Megabase Shot Sequencing the Rd genome of Haemophilus influenzae using a cancer sequencing method Obtained by. The three parameters for accuracy discussed in the examples below are In use, the inventor has determined that the sequence of SEQ ID NO: 1 has up to 99.98% accuracy. Calculated. Thus, the nucleotide sequence provided in SEQ ID NO: 1 is Nucleotide sequence of Rd genome, not necessarily 100% complete Is very accurate. As discussed in detail below, SEQ ID NO: 1 and Tables 1 (a) and 2 Using the information provided, along with routine cloning and sequencing methods, Those of ordinary skill in the art will recognize a wide variety of H. influenzae proteins. All important "representative fragments, including quality coding open reading frames (ORFs)" Could be cloned and their sequence determined. Extremely In rare cases, this may be present in the nucleotide sequence disclosed in SEQ ID NO: 1. May reveal errors in the nucleotide sequence. Therefore, once the present invention is used, When it becomes available (that is, the sequence identification number: 1 and the information in Tables 1 (a) and 2 are used once) (When possible), the resolution of the rare error in sequencing in SEQ ID NO: 1 It will be within the skill level of the field. Nucleotide sequence editing software is publicly available Noh. For example, the Applied Biosystem (AB) ) 'S Auto Assembler ™ while visually examining nucleotide sequences Can be used as an auxiliary. Even if all the very rare sequencing errors of SEQ ID NO: 1 have been corrected The nucleotide obtained is still the nucleotide sequence of SEQ ID NO: 1. Will be at least 99.9% identical. Genomic nucleotide sequences from different influenza strains may vary slightly. Has become. However, the nucleotide sequence of the genome of all influenza strains The sequence is at least 99.9% identical to the nucleotide sequence provided in SEQ ID NO: 1. There will be. Thus, the invention further comprises a nucleotide sequence of SEQ ID NO: 1 comprising at least 9 9.9% identical nucleotide sequences are easily used, analyzed and interpreted by skilled technicians Provide in a form that you can. The nucleotide sequence is SEQ ID NO: 1. The method for determining whether a sequence is at least 99.9% identical is routine and It is readily available to skilled technicians. For example, the well-known fasta algorithm (Pearson and Lipman, Proc. Natl. Acad. Soi. USA85: 2444 (1988 ) Can be used to obtain the percent identity of the nucleotide sequence. Computer-related implementation Nucleotide sequence provided in SEQ ID NO: 1, representative fragment thereof Or nucleotide sequences at least 99.9% identical to SEQ ID NO: 1 It could be "provided" in a variety of media that facilitate use. As used herein Sometimes, provided is the nucleotide sequence of the present invention, that is, SEQ ID NO: 1. Provided nucleotide sequence, representative fragment or SEQ ID NO: 1 Of non-isolated nucleic acid molecules containing a nucleotide sequence at least 99.9% identical to Say things. Such a product may be the Rd genome or a subset thereof. (E.g., Rd reading frame (ORF) for Haemophilus influenzae) Test the Rd genome or a subset of the genome present in natural or purified form A skilled technician tests this product using means not directly applicable to Provide in a form that allows you to do so. In one application of this embodiment, the nucleotide sequence of the present invention is It can be recorded on a read medium. As used herein, "computer Computer readable media "means that a computer can read and directly access Say medium. Such media include floppy disks and hard disk storage. Storage media and magnetic storage media such as magnetic tapes; optical storage media such as CD-ROMs Body: electronic storage media such as RAM and ROM; and compounds of these categories For example, but not limited to, magnetic / optical storage media. Skilled technician Uses the currently known computer readable media to Computer readable medium with the recorded nucleotide sequence It will be easy to understand how to create a product. As used herein, "recorded" refers to computer readable media. The process of storing information. Skilled technicians should be able to The method of recording information on the read-out medium is easily adopted to obtain the nucleotide sequence information of the present invention. Would be made. Creation of a computer-readable medium recording the nucleotide sequence of the present invention Skilled technicians can use a variety of data storage structures to Wear. The choice of data storage structure is generally selected to access stored information. Based on the means chosen. In addition, various data processor programs and Computer-readable nucleotide sequence information of the invention using It can be stored on a delivery medium. Sequence information is stored in a word processing text file. Or Word Perfect (Microsoft Word) or Microsoft Word (Micr o Format in commercially available software such as Soft Word) Or during database applications such as DB2, Sybase, Oracle, etc. It can be shown in the form of a saved ASCII file. The skilled technician must Obtaining a computer-readable medium on which the nucleotide sequence information of Any number of data processor construction formats (eg, text files) File or database) can be easily adapted. SEQ ID NO: 1 nucleotide sequence, representative fragment or sequence thereof Computer readable nucleotide sequence at least 99.9% identical to 1 Skilled engineers can provide sequence information for various purposes by providing Can be accessed. A skilled technician submits the Computer software that can access the sequence information provided is publicly available. Is available at The following example uses BLAST (Altschul J. et al. Mol. Biol.215: 403-410 (1990)) and BLAZE (Brutlag et al.) , Comp. Chem.17; 203-207 (1993)) Software that executes search algorithms How to use the open reading frame (O) in the Rd genome of Haemophilus influenzae RF), including its homologues or proteins obtained from other organisms. Is shown. Such an ORF is a fragment in the Rd genome of Haemophilus influenzae. Protein that encodes To produce commercially important proteins, such as enzymes used in the manufacture of cereals Useful. The present invention further relates to systems containing the sequence information described herein, especially Provide a computer-based system. Such a system is Designed to identify commercially important fragments of the Rd genome Is being measured. As used herein, a "computer-based system" is a computer-based system of the present invention. Hardware means, software used to analyze nucleotide sequence information Air means and data storage means. Computer based system of the present invention The minimum hardware means is a central processing unit (CPU), input means, It includes output means and data storage means. Skilled technicians are currently available Any suitable computer-based system is suitable for use in the present invention It can be easily understood. As described above, the computer-based system of the present invention Data storage means for storing the sequence of otide and necessary hardware means and It includes software means for supporting and implementing the search means. As used herein The `` data storage means '' can store the nucleotide sequence information of the present invention. Memory or products that record nucleotide sequence information of the present invention. Memory access means that can be used. As used herein, "search means" refers to a target sequence or target structural motif. Computer to compare the sequence with the sequence information stored in the data storage Refers to one or more programs that run on the based system. Search method Is a fragment of the Rd genome that matches the specific target sequence or target motif. Used to identify fragments or regions. A variety of known algorithms A variety of publicly disclosed and commercially available software for performing search means Air is used and used in the computer-based system of the present invention Can be. An example of such software is MacPattern (E). MBL), BLASTN and BLASTX (NCBIA). Not determined. Skilled technicians have available algorithms to perform homology searches. Any software or executable software package based on the computer of the present invention. It can be easily recognized that it can be adapted for use in the system. As used herein, a "target sequence" is six or more nucleobases. Tide or any DNA or amino acid of two or more amino acids It can be an acid sequence. Skilled technicians believe that the longer the target sequence, Sequences are less likely to be present as random occurrences in the database. It can be easily recognized. The most preferred sequence length of the target sequence is about 10 to 10 0 amino acids or about 30 to 300 nucleotide residues. However, the Inn Commercially important fragments of the fluenza Rd genome, such as gene expression and Searches for sequence fragments involved in protein processing are shorter length fragments. Is well recognized. As used herein, “target structural motif” or “target motif” Three-dimensional arrangement in which the sequence (s) is formed by folding the target motif Refers to any reasonably selected sequence or combination of sequences that is selected based on . A variety of target motifs are known in the art. Protein target motif Include, but are not limited to, an enzyme active site and a signal sequence. Nucleic acid Target motifs include promoter sequences, hairpin constructs and inducible expression elements ( Protein binding sequences). Using a variety of structural formats for input and output Input or output information with the computer-based system of the invention Can be The preferred format for the output means is the target sequence or target. Of the Rd genome of Haemophilus influenzae having various degrees of homology with the genetic motif Rank. What has been shown in this way is available to skilled technicians in varying amounts of target distribution. Provides a ranking of the sequence containing the sequence or target motif, and Identify the degree of homology contained in the fragment. Compare target sequences or target motifs with data storage using a variety of comparison methods To identify sequence fragments of the Rd genome of Haemophilus influenzae . In embodiments of the present invention, the BLAST and BLAZE algorithms (Altschul et al. J. Mol. Biol.215: 403-410 (1990)) It was used to identify an open reading frame within the Haemophilus influenzae Rd genome. Skilled technicians All publicly available homology search programs are based on the computer of the present invention. It can easily be used as a search method for the system. You can understand. One application of this embodiment is provided in FIG. FIG. 2 is a diagram for implementing the present invention. Provides an exploded view of a computer system 102 that can be used for Compu The motor system 102 includes a processor 108 coupled to the bus 104. Busbar Has a main memory 108 (preferably implemented as random access memory, RAM And a variety of secondary storage devices 110, such as hard drive 112 and removable A secure media storage device 114 is also coupled. Removable media storage device 114 For example, floppy disk drive, CD-ROM drive, magnetic tape drive It can represent live and the like. Control logic and / or data is recorded A removable storage medium 116 (eg, floppy disk, compact Disk, magnetic tape, etc.) can be inserted into the removable media storage device 114. it can. The computer system 102 may be accessed once by the removable media storage device 114. Control logic and / or from the inserted removable media storage device 114 Includes software suitable for reading data. The nucleotide sequence of the present invention comprises a main memory 108, an optional secondary storage device 110. And / or may be stored on removable storage medium 116 in well-known fashion. Geno Software that accesses and processes the array of programs (eg, search tools, Comparison tools, etc.) reside in the running main memory. Biochemical embodiment Another embodiment of the present invention relates to an isolated fragment of the Rd genome of Haemophilus influenzae. Is pointed at. The fragments of the Rd genome of the Haemophilus influenzae of the present invention include , Fragment encoding peptide (hereinafter referred to as open reading frame (ORF)), manipulation A fragment that regulates the expression of the operably linked ORF (hereinafter, expression regulation flag) Fragment (EMF)), a fragment that mediates the uptake of bound DNA fragments into cells. (Hereinafter referred to as uptake regulatory fragment (UMF)) and influenza in the sample. Fragments that can be used to diagnose the presence of Rd. (DF)), but is not limited thereto. As used herein, "isolated nucleic acid molecule" or "H. influenzae Rd geno What is an "isolated fragment of a system?" Compounds that have been subjected to purification procedures and are usually associated with the composition A nucleic acid molecule having a specific nucleotide sequence in which the number of To tell. A variety of purification means can be used to produce the isolated fragments of the present invention. it can. These separate the components of the solution based on charge, solubility, or size. But not limited thereto. In one embodiment, the Rd DNA of Haemophilus influenzae is mechanically sheared for 15-20 A fragment of kb length can be made. Then use these fragments Using the above fragment in a lambda clone as described in the Examples below. Generating a Haemophilus influenzae Rd library by inserting Can be. Then use the nucleotide sequence information provided in SEQ ID NO: 1 Thus, for example, a primer with the ORF provided at mark 1 (a) on the side is generated. Can be made. The lambda DNA line was then cloned using the PCR cloning method. The ORF can be isolated from the prairie. PCR cloning technology Well known in the art. Thus, SEQ ID NO: 1, Table 1 (a) and Table 2 are available Given the nature, isolate any ORF or other nucleic acid fragment of the invention Things will be boilerplate. The isolated nucleic acid molecules of the present invention include single- and double-stranded DNA and single-stranded RNA. But not limited to these. As used herein, the "open reading frame", ORF, has no stop codon. Refers to a series of three bases that code for different amino acids and can be translated into proteins It is a functional array. Tables 1a, 1b and 2 show ORFs in the Rd genome of Haemophilus influenzae. Has been identified. In particular, Table 1a is based on the organisms in parentheses (see column 4 of Table 1 (a)). Copy the described proteins based on homology that matches the resulting protein sequence. The location of the ORF in the loaded H. influenzae genome is indicated. The first column of Table 1 (a) provides the "gene identification" for a particular ORF. This information Information is useful for two reasons. First, the inputs provided in FIGS. A complete map of the R. fluenza Rd genome is provided by these gene identification numbers. Has become widespread in ORFs. Second, Table 1 (b) uses the gene identification numbers , Which ORFs have been provided in public databases so far. Columns 2 and 3 of Table 1 (a) are the nucleotide sequence provided in SEQ ID NO: 1. The position of the ORF inside is shown. For those of ordinary skill, the ORF is You will recognize that they may be pointing in the opposite direction in the Genza genome. This This is shown in columns 2 and 3. Column 5 of Table 1 (a) shows the protein encoded by the ORF and column 4 Percent identity to the corresponding protein from the organism in parentheses is shown. You. Column 6 of Table 1 (a) shows the protein encoded by the ORF and column 4 The percentage similarity to the corresponding protein from the organism in parentheses is indicated. You. The concepts of percent identity and percent similarity between two polypeptide sequences are It is well understood in the art. For example, at three amino acid positions (eg, Two polypeptides 10 amino acids in length differing in positions 1, 3, and 5) It is said to have a percent identity of 70%. However, this same two poly Peptides are "similar", for example, in which the amino acid portion at position 5 is not identical, (Ie, have similar biochemical characteristics), a percent similarity of 80% Would be considered to have. Column 7 of Table 1 (a) shows the length of the amino acid homology match. Table 2 also shows "homologous matches" with known protein sequences obtained from another organism. Haemophilus influenzae Rd genome encoding unsuccessful polypeptide sequence ORF is provided. Details on the algorithms and criteria used for homology searches Details are further provided in the examples below. Skilled technicians are responsible for influenza R other than those shown in Tables 1 (a), d ORFs in the genome, for example, using the computer-based system of the present invention Duplicate or conflict with the identified ORF, in addition to those that can be The ORF encoded by the opposite strand can be easily identified. As used herein, an "expression control fragment", EMF, is an operable fragment. Refers to a series of nucleotide molecules that regulate the expression of an ORF or EMF bound to You. As used herein, when expression of a sequence is altered by the presence of EMF The sequence is said to "regulate the expression of the operably linked sequence." EMF Motor and promoter regulatory sequences (inducible elements), including but not limited to Not done. One class of EMF responds to specific modulators or physiological events And a fragment that induces the expression of an operably linked ORF. Hemov For a review of known EMFs from the genus Virus, see Tomb et al.104: 1-1 0 (1991)) and Chandler M. S. (Chandler, MS) (Proc. Natl. Aoad. Sci. USA89: 1826-1830 (1992)) You. The EMF front row depends on the proximity to the ORF provided in Tables 1 (a), 1 (b) and 2. Can be identified in the Rd genome. Table 1 (a), 1 (b) or Is the length of about 10 to 200 nucleotides taken at the 5 'of any one of the two ORFs The intergenic fragment or the fragment of the intergenic fragment In a manner similar to that found in the column, expression of the operably linked 3 'ORF was Will adjust. As used herein, “intergenic fragment” is defined herein as The fragment of the genus Haemophilus between the two ORFs described in the above. Some EMF is a target sequence or target motif in the computer-based system of the present invention. Can be identified using known EMF as the The presence and activity of EMF can be confirmed using an EMF trap vector. Wear. The EMF trap vector has a cloning site 5 'for the marker sequence. Have. The marker sequence has an identifiable phenotype, such as antibiotic resistance or recruitment The auxotrophic factor is encoded and the EMF trap vector is The above phenotype can be identified or assayed when placed in a suitable host under the circumstances. You. As described above, EMF regulates the expression of an operably linked marker sequence. Will do. More detailed consideration of various marker sequences Is provided below. A sequence considered to be EMF is replaced with a marker sequence in the EMF trap vector. Cloned in all three open reading frames of one or more restriction sites upstream from I do. This vector is then transformed into a suitable host using known methods. And the phenotype of the transformed host is tested under appropriate conditions. As mentioned above , EMF regulates the expression of an operably linked marker sequence. As used herein, an "uptake regulatory fragment", UMF, binds DN A series of nucleotide molecules that mediate the uptake of the A fragment into cells You. UMF uses the computer-based system described above to target sequence or target. Can be easily identified using a known UMF as the genetic motif. The presence and activity of UMF allows what is considered UMF to bind to the marker sequence. Can be confirmed. Next, the obtained nucleic acid molecule is transferred to a suitable host under appropriate conditions. And measure the uptake of the marker sequence. Above Thus, UMF increases the frequency of uptake of the bound marker sequence. Haemophilus For a review of DNA uptake in Goodgal S. H. (Goodgall, SH) J. et al. Bact.172: 5924-5928 (1990). As used herein, "diagnostic fragment", DF, A sequence of nucleotide sequences that selectively hybridizes to a sequence. DF Is by identifying unique sequences within the Haemophilus influenzae Rd genome or , In a suitable diagnostic format to measure amplification or hybridization selectivity Generating and testing a probe or amplification primer consisting of the F sequence Can be easily identified. Sequences that fall within the scope of the present invention are not limited to the particular sequences described herein. It also includes those alleles and species variants. Alleles and variants are sequence knowledge SEQ ID NO: 1, a representative fragment thereof or a sequence identifier number: 1 At least 99.9% identical nucleotide sequence to another isolate of the same species Can be routinely determined by comparing with the sequence obtained from Further In order to accommodate codon variability, the present invention provides certain O 2 compounds disclosed herein. Nucleic acid molecules that encode the same amino acid sequence as that encoded by RF are included. In other words, one codon replaces the same amino acid in the coding region of the ORF. It is expressly intended to replace it with another coding codon. Any particular sequence disclosed herein may be a specific fragment, such as an ORF. Re-sequencing the sequence in both directions (ie, the front row of both strands) Can be screened. Alternatively, screening for errors is a Isolated using some or all of the fragments in question as primers or primers By determining the sequence of the corresponding polynucleotide of Haemophilus influenzae origin, Can be implemented. Each ORF of the Haemophilus influenzae Rd genome disclosed in Tables 1 (a), 1 (b) and 2 And EMF found in the 6 'of the ORF can be used as a polynucleotide reagent in a number of ways. Can be used with These sequences are very similar to the H. influenzae RD in the sample. Diagnostic probe or diagnostic amplification primer to detect the presence of specific microorganisms such as Can be used. This is very selective for Haemophilus influenzae. This is particularly true in the case of fragments or ORFs. In addition, triple helix formation using the broadly described fragments of the invention Alternatively, gene expression can be controlled by antisense DNA or RNA. And both of these methods involve binding of a polynucleotide sequence to DNA or RNA. Is based on Polynucleotides suitable for use in these methods are typically 20 Gene region that is 40 bases in length and involved in transcription (triple helix-Lee, etc.) , Nucl. Acids Res.6: 3073 (1979); Cooney et al., Science241: 456 (1988 ); And Dervan et al., Science.2511360 (1991)) or the mRNA itself (A Antisense-Okano, J.M. Neurochem.56: 560 (1991); Oligodeoxynucleot ides as Antisense Inhibitors of Gene Expression, CRC Press, Flow Designed to be complementary to Boca Leton, Lida (1988). Triple helix Formation maximizes transcription of RNA from DNA. Stops appropriately while antisense RNA hybridization forms Blocks translation into peptides. Both technologies proved to be effective in the model system Have been. The information contained in the sequences of the invention may be antisense or triple helix Necessary for designing lignonucleotides. The invention further relates to one or more of the Haemophilus influenzae Rd genomes of the invention. A recombinant construct comprising the fragment is provided. The recombinant construct of the present invention is a vector For example, plasmids or viral vectors, and contains Fluenza Rd is inserted in the forward or reverse direction. One of the ORFs of the present invention In the case of a vector containing one, the vector may be further manipulated, for example, into an ORF. Regulatory sequences can be included, including operably linked promoters. Book In a vector comprising EMF and UMF of the invention, the vector may further comprise EMF or UMF. May comprise a marker sequence or a heterologous ORF operably linked to . Many suitable vectors and promoters are known to those of skill in the art. And can be obtained commercially to produce the recombinant constructs of the present invention. Wear. The following vectors are provided as examples. Bacterial: pBs, phage script, PsiX174, pBluescript SK, pBS KS, pNH8a, pNH16a, pNH18a, p NH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRI T6 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXTI, pS G (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia). The promoter region is based on OAT (chloramphenicol transferase). Using a vector or other vector with a selectable marker to select from the desired gene. You can choose. Two suitable vectors are pKK232-8 and pCM7. Special Other named bacterial promoters include lacI, lacZ, T3, T7, gpt, Lambda P_EAnd trc. Eukaryotic promoters include CMV First, HS V thymidine kinase, early and late SV40, LTR obtained from retrovirus As well as mouse metallothionein-I. Appropriate vector or promoter The choice is just within the normal level of skill in the art. The present invention further relates to an isolated fragment of the Rd genome of the Haemophilus influenzae of the present invention. Providing a host cell containing at least one of the fragments, wherein the fragment Have been introduced into host cells using the method. The host cell is like a mammalian cell A higher eukaryotic host cell, such as a yeast cell. Alternatively, the host cell can be a prokaryotic cell, such as a bacterial cell. Host Introduction of the recombinant construct into the cells was performed by calcium phosphate transfection, DE By AE, dextran mediated transfection or transduction (Davis, L., et al., Basic Methods in Molecular). Biology (1986)). A host cell containing one of the fragments of the Rd genome of the Haemophilus influenzae of the present invention. Gene products encoded by isolated fragments using cells in a conventional manner (For ORFs) or under the control of EMF It can be used to produce proteins. The present invention further relates to a nucleic acid fragment of the present invention or a nucleic acid fragment of the present invention. Isolated polypeptides encoded by degenerate variants of the polypeptide. "Shrink By "heavy variant", a nucleic acid fragment (eg, an ORF) of the present invention can be Peptide sequences differ, but the same polypeptide sequence is copied due to the degeneracy of the genetic code. A coding nucleotide fragment is contemplated. Preferred nucleic acids of the present invention The fragment is the ORF shown in Table 1 (a) that encodes the protein. The isolated polypeptides of the invention can be prepared using a variety of methodologies known in the art. Either one of protein or protein can be obtained. At the easiest level Can synthesize amino acid sequences using commercially available peptide synthesizers. Wear. This is the production of smaller peptides and fragments of larger polypeptides. Is particularly useful in Fragments include, for example, antibodies to the native polypeptide Is useful in the production of In an alternative, the polypeptide or protein is a polypep Purified from bacterial cells that naturally produce tides or proteins. In the technical field The skilled artisan can readily follow known methods for isolating polypeptides and proteins Of an isolated polypeptide or protein You can get one. These include immunochromatography, HPLC , Size exclusion chromatography, ion exchange chromatography and immunoa Including, but not limited to, affinity chromatography. The polypeptide or protein of the present invention may alternatively be a desired polypeptide or protein. It can be purified from cells that have been modified to ring the protein. This specification As used herein, cells are not normally produced or cells are usually at lower levels Genetic manipulation so that cells produce the polypeptide or protein produced in When the cells have been modified to express the desired polypeptide or protein, Say that it has been modified. Those skilled in the art will appreciate the polypeptides or proteins of the present invention. Recombination sequences in eukaryotic or prokaryotic cells to produce cells that produce one of the proteins The method of introduction and expression of either or synthetic sequences can be readily adapted. Can be. Using any host / vector system, one or more ORFs of the invention Can be expressed. These include eukaryotic hosts such as HeLa cells, Cv- 1 cell, COS and Sfs cells, and prokaryotic hosts such as E. coli and Bacillus Bacteria include, but are not limited to. Most preferred cells are usually Does not express a polypeptide or protein or a polypeptide or protein Cells that express proteins at low natural levels. As used herein, “recombinant” refers to a group of polypeptides or proteins. Derived from a recombinant (eg, microbial or mammalian) expression system . A “microorganism” is a recombinant microorganism produced in a bacterial or fungal (eg, yeast) expression system. Refers to lipetide or protein. As a product, "recombinant microorganisms" Possibly essentially free of endogenous substances and without associated natural glycosylation Specify the peptide or protein. In most bacterial cultures, for example in E. coli The expressed polypeptide or protein has no glycosylation modifications; The polypeptide or protein expressed in the mother is expressed in mammalian cells Has a different glycosylation pattern. "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides . Generally, DNs encoding the polypeptides and proteins provided herein. A fragment is a set containing regulatory elements derived from a microbial or viral operon. In order to provide a synthetic gene that can be expressed in the recombinant transcription unit, From a fragment of the Rd genome and a short oligonucleotide linker, or Assembled from a series of oligonucleotides. "Recombinant expression vehicle or vector" refers to a polypeptide (polypeptide) from a DNA (RNA) sequence. Plasmid or phage or virus or vector for expressing . The expression vehicle may comprise (1) one or more genes having a regulatory role in gene expression. Child elements, such as promoters or enhancers, (2) transcribed into mRNA and Structure or coding sequence that is translated into a protein, and (3) appropriate transcription initiation And a transcription unit comprising an assembly of termination sequences. Structural units intended for use in yeast or eukaryotic expression systems are preferably Contains a leader sequence that enables extracellular secretion of translated protein in the host cell. Have. Alternatively, the recombinant protein is expressed without a leader or transport sequence If so, it can contain an N-terminal methionine residue. This residue is the last Subsequent cleavage from the expressed recombinant protein to provide the product Can or cannot. "Recombinant expression system" integrates a recombinant transcription unit stably into chromosomal DNA. Refers to a host cell that is running or has a recombinant transcription unit extrachromosomally . The cells can be prokaryotic or eukaryotic cells. Recombinant expression identified herein The system is based on the induction of regulatory elements associated with the expressed DNA fragment or synthetic gene. To express a heterologous polypeptide or protein. Mature proteins can be obtained from mammalian cells, yeast, bacteria, Alternatively, it can be expressed in other cells. Using a cell-free translation system, Producing such proteins using RNA derived from NA constructs Can also be. Cloning and suitable for use in prokaryotic and eukaryotic hosts Expression vectors include molecular cloning, such as Sambrook, etc .: Laboratory Manual (Molecular Cloning: A Laborato ry Manual, 2nd edition, Cold Spring Harbor, New York (198), the disclosure of which is incorporated herein by reference. It is described. In general, recombinant expression vectors contain an origin of replication and a source that allows transformation of the host cell. And selectable markers such as the E. coli ambicillin resistance gene and S-cerevisiae T Directs transcription of downstream structural sequences derived from the RP1 gene and highly expressed genes Contains a promoter. Such promoters are, inter alia, 3- Sphoglycerate kinase (PGK), factor a, acid phosphatase or heat shock Can be derived from operons that encode glycolytic enzymes such as glycoproteins. Wear. Heterologous structural sequences include translation initiation and termination sequences, (and preferably Or with a leader sequence capable of directing secretion of the translated protein into the extracellular medium Assembled in different phases. Optionally, the heterologous sequence has a desired characteristic, such as an expressed set. Fusions containing N-terminal identifying peptides that stabilize the recombinant product or simplify purification Can encode a protein. Expression vectors useful for use in bacteria are operable having a functional promoter. Within the active reading phase, together with the appropriate translation initiation and termination signals, It is constructed by inserting a structural DNA sequence that encodes quality. Vector Provide amplification in the host to ensure vector maintenance and, if desired, To include one or more phenotypic selectable markers and origins of replication. I have. Suitable prokaryotic hosts for transformation include Escherichia coli, Bacillus subtilis, and Salmonella typhimurium. And various species within the genus Pseudomonas, Streptomyces and Staphylococcus , But others can be used as selections. In an exemplary embodiment, an expression vector useful for use in bacteria is a selectable marker. And the genetic elements of the well-known cloning vector pBR322 (ATCC 37017) Contains bacterial origin of replication derived from commercially available plasmids containing However, the present invention is not limited to this example. Such commercially available vectors include, for example, , PKK223-3 (Pharmacia Fine Chemicals, Ufsala, Sweden) and GEM1 (Promega Biotec, Ma, Wisconsin, USA Dison). These pBR322 "backbone" sections are Combined with the motor and the structural sequence to be expressed. After transforming the appropriate host strain and growing the host strain to an appropriate cell density, The selected promoter can be replaced by appropriate means (eg, temperature shift or chemical induction). Thus, it is activated and the cells are cultured for a further period. Cells are typically centrifuged Collected, destroyed by physical or chemical means, and the resulting crude extract Is maintained for further purification. Various mammalian cell culture systems can also be used to express recombinant proteins. You. Examples of mammalian expression systems include Gluzman, Cell.twenty three: 175 (1 981) monkey kidney fibroblast cell line COA-7 and compatible Other cell lines capable of expressing the vector, such as C127, 3T3, CHO, HeLa and BHK cell lines are included. Mammalian expression vectors have an origin of replication, a suitable promoter And enhancers, and even necessary ribosome binding sites, polyadenylation Sites, splice donor and acceptor sites, transcription termination sequences, and 5 ' May contain transcribed sequences. DNA sequence derived from SV40 virus genome, eg For example, SV40 origin, early promoter, enhancer, splice and polyadenylate The site of cleavage can be used to provide the required non-transcribed genetic elements. Recombinant polypeptides and proteins produced in bacterial cultures are usually Can be used for more salting out, aqueous ion exchange or size exclusion chromatography steps. Prior to initial extraction from the cell pellet, it is isolated. Completed mature protein placement In doing so, a protein regeneration step can be used if necessary. last Use high performance liquid chromatography (HPLC) for the final purification step. Can be. The microbial cells used for protein expression were subjected to a freeze-thaw cycle. By any convenient method, including the use of sonication, mechanical disruption or cell lysis. Can be destroyed. The present invention further provides an isolated polypeptide substantially equivalent to that described herein, Includes proteins and nucleic acid molecules. As used herein, substantially equivalent means Different from the reference sequence by one or more substitutions, deletions or additions And the net effect is that unfavorable functional differences between the reference and target sequences Both non-contributing nucleic acid and amino acid sequences, e.g. Can be. For the purposes of the present invention, having equivalent biological activity and equivalent expression characteristics Are considered substantially equivalent. For the purpose of equivalence determination, a portion of the mature sequence Disconnection should be ignored. The present invention further relates to the fragment of the Rd genome of the Haemophilus influenzae of the present invention. Homologs from other strains of S. fluenzae and encoded by the ORFs of the invention Provided is a method for obtaining a homolog of a known protein. As used herein The sequence or protein of Haemophilus influenzae is the Rd gene of Haemophilus influenzae of the present invention. Encoded by one of the nome fragments or one of the ORFs of the invention If it shares significant homology with protein A, this is the influenza of the invention. A fragment encoded by one of the fragments or ORFs of the bacterial Rd genome Identified as homologues of proteins. Specifically, the sequences disclosed herein can be probed Or used as primers and PCR cloning method Using techniques such as dark hybridization, those skilled in the art You can get the body. As used herein, two nucleic acid molecules or proteins are those two % (Amino acid or nucleic acid) "Share significant homology." Nucleotide sequence provided by SEQ ID NO: 1 or as few as SEQ ID NO: 1 Region-specific primers or primers derived from nucleotide sequences that are 99.9% identical to each other. The lobes are used to prime DNA synthesis and RNA amplification, and (Innis et al., PCR Protocols, Academic Press, Sande, CA Includes cloned DNA encoding homologs using Diego (1990) Colonies to be identified can be identified. SEQ ID NO: 1 or a nucleotide at least 99.9% identical to SEQ ID NO: 1 When using primers derived from the sequence, those skilled in the art By using stringent conditions (eg, annealing at 50-60 ° C.) Recognizes that only sequences that are 75% or more homologous to the Will be. Use lower stringency conditions (eg, annealing at 35-37 ° C). The use of this primer will also amplify sequences that are more than 40-50% homologous to this primer. There will be. SEQ ID NO: 1 or sequence identification for colony / plaque hybridization DNA sequence derived from a nucleotide sequence at least 99.9% identical to No. 1 When using a probe, those skilled in the art will recognize that high stringency conditions (eg, 5 × S Hybridization at 50-65 ° C. in SPC and 50% formamide and 0.5 × SS By washing with 50-65 ° C in PC), more than 90% Sequences having the same region can be obtained, and lower stringency conditions (eg, Hybridization with 5X SSPC and 40-45% formamide and SSP More than 35-45% homology to this probe by using It will be appreciated that sequences with different regions will be obtained. An organism naturally expresses such a protein or expresses such a protein. Any source of homologues of the present invention as long as it contains the encoding gene Organisms can be used. The most preferred organism for isolating a homolog is It is a bacterium closely related to Haemophilus influenzae Rd. Use of the composition of the present invention Each ORF provided in Table 1 (a) is referred to as Relay. (Riley, M.) (Microbiol ogy Reviews57(4): 862 (1993)) 102 biological role categories Assigned to one of the This allows the skilled technician to identify each coding sequence identified. The use can be determined. Table 1 (a) is further coded by each ORF Provides confirmation of the type of peptide. As a result, those skilled in the art The polypeptides of the invention can be purchased commercially, consistent with the type of putative polypeptide identification, Can be used for therapeutic and industrial purposes. Such identification allows those skilled in the art to identify the H. influenzae ORF. , In a manner similar to the known type of sequence being identified; Used to ferment sources or produce specific metabolites can do. (For a review of enzymes used within for-profit industries, see Biohem ical Engineering and Biotechnology Handbook 2nd edition, Macmillan Publ . Ltd., New York (1991) and Biocatalysts in Organic Syntheses Edit J. Tramper et al., Elsevier Science Publishers, Amste, The Netherlands Ledame (1985)). 1. Biosynthetic enzymes Enzymes involved in the breakdown of metabolic intermediates, enzymes involved in central intermediate metabolism, Enzymes involved in respiration, both aerobic and anaerobic enzymes involved in fermentation, ATP proto Enzymes involved in transduction of amino acids, enzymes involved in a wide range of regulatory functions, and amino acid synthesis Enzymes involved in DNA synthesis, enzymes involved in nucleotide synthesis, cofactors and vitamin synthesis Catalytic reactions involved in intermediate and macromolecule metabolism, including the production of small molecules Reads encoding proteins involved in mediation of synthesis, cellular processes and other functions The framework can be used for industrial biosynthesis. Present in Haemophilus The various metabolic pathways are based on the absolute nutritional requirements as well as the various identified in Table 1 (a). Can be identified by testing the enzyme. The ORF identified in the intermediate metabolism category and identified in Table 1 (a) A large number of proteins are loaded for degradation of intermediate metabolites and non-macromolecular metabolism. Are involved in Some of the enzymes identified include amylase, glucose, and oxy. Including dase and catalase. Proteolytic enzymes are another class of commercially important enzymes. Tampa Enzymes are used in a number of industrial processes, including the processing of flax and other plant fibers, in juice. Extraction, clarification and pectin removal, extraction of vegetable oils, and Use in softening fruit and plants has been found. Tampa used in the food industry A detailed review of pyrolytic enzymes can be found in Rombouts et al. (Symbiosistwenty one: 79 (1 986)) and Voragen et al. (Biocatalyst in Agricultural Biot) echnology, edited by JR. Whiteter et al., American Chemical Society Sympos ium Seriess389: 93 (1989)) Have been. The metabolism of glucose, galactose, fructose and xylose is of the genus Haemophilus Is an important part of the early metabolism. The enzymes involved in the breakdown of these sugars are industrial Can be used in fermentation. Some of the sugar-converting enzymes that are important from a commercial perspective include Sugar isomerases such as glucose isomerase are included. Ketogulonic acid (K Commercial use for other metabolic enzymes such as glucose oxidase that produce GA) Have been found. KGA is based on the method of Reichstein (Kr ueger, Biotechnology6(A), Rhine, edited by HJ, Verlag Press, de For commercial production of ascorbic acid using Weinheim, Germany (1984) Intermediate. Glucose oxidase (GOD) is commercially available and in beer It is used in purified as well as fixed form for oxygen removal. Hartmeyer tmeir), etc. (Biotechnology Letters1: 21 (1979). The most important GO The application of D is an industrial scale fermentation of gluconic acid. The market for using gluconic acid is Detergent, textile, leather, photography, pharmaceutical, food, feed and concrete industries (B Edited by igelis, Gene Manipulations and Fungi, Benett JW, etc., Academic Press, New York (1985), p. 357). GOD in addition to industrial applications Are drugs for the quantitative determination of glucose in body fluids, more recently starch and cellulose Application found in biotechnology for analyzing syrups obtained from digests Have been. Owusu et al. (Biochem. At Biophysics. Acta.872: 83 (1986)). The main sweetener used today in the world comes from sugar beet and sugar cane It is sucrose. In the field of industrial enzymes, the glucose isomerase method is the market today Shows the largest spread. First, soluble enzymes are used, and then Immobilized enzymes have been developed (Krueger et al., Biotechnology, The Textbook of Ind ustrial Microbiology, Sinauer Associated Incorporated, Mass. Sunderland, Switzerland (1890)). Today, from glucose using immobilized enzymes The use of manufactured high fructose syrup is by far the largest industrial business. this A review of the industrial use of enzymes by Jorgensen (Jorg ensen) (Starch40: 307 (1988)). Proteinases, such as alkaline serine proteinases, are used as detergent additives. As used, it represents one of the largest amounts of microbial enzymes used in the industrial field. doing. Use of these enzymes in industrial processes because they are of industrial importance There is a lot of published and unpublished information about. (Faultman et al., Acid Prot eassa Structure Function and Biology, edited by Tang, J., Plenum Press New York (1977) and Godfrey et al., Industrial Enzymes, MacMilla. n Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial E nzymes by 1990, Hel Hepner & Associates, London (1986)). Another class of commercially available proteins of the invention is identified in Table 1. Microbial lipase (Macrae et al., Philosophical Transactions of the C hirai Society of London310: 227 (1985) and Poserke, Journal of the American Oil Chemist Society611758 (1984)). Lipase Neutralization using lipase-catalyzed interesterification of readily available triglycerides It is mainly used in the fats and oils industry to produce lyserides. To apply lipase, Includes use as a detergent additive to facilitate removal of fat from fabric during the cleaning process You. Enzymes, and especially microbial enzymes, as key steps in the synthesis of complex organic molecules Use is gaining in popularity at a very fast rate. One very important area is key Production of ral intermediate. The production of chiral intermediates is a wide range of synthetic chemists, especially new drugs Important for scientists involved in the production of pesticides, fragrances and flavors. ( Davies et al., Recent Advances in the Generation of Chirality Entermediat es Using Enzymes, CRC Press, Pocalemton, Florida (1990) . The following reactions catalyzed by enzymes are important for organic chemists: Carbo Hydrolysis and esterification of acid esters, phosphate esters, amides and nitriles Reaction, trans-esterification reaction, amide synthesis, alkanone and oxoalkane Reduction, oxidation of alcohols to carbonyl compounds, sulfides to sulfo Oxidation to oxide, as well as aldol A carbon bond forming reaction such as a reaction. For biotransformation and organic synthesis When considering the use of an enzyme encoded by one of the ORFs of the invention Consider the advantages and disadvantages of using microorganisms, completely different from isolated enzymes Is sometimes necessary. On the one hand whole cell lines or on the other hand partially purified The pros and cons of using an isolated enzyme are described in Bud et al. (Chemistry in B ritain (1987), p. 127). Aminotransferase, an enzyme involved in the biosynthesis and metabolism of amino acids, is Useful in the catalytic production of carboxylic acids. The advantage of using a microbial-based enzyme system is amino Transferase enzymes catalyze the stereoselective synthesis of l-amino acids only and generally A high catalyst speed. Amino for amino acid production Instructions for using transferase are provided by Roselle-David (Methods of Enzymology136: 479 (1987)). Another Useful Protein Encoded by the ORF of the Invention The various categories include enzymes involved in nucleic acid synthesis, repair and recombination. Commercial Diverse Enzymes Have Been Isolated from Members of Haemophilus Species . These include Hinc II, Hind III and Hinf I restriction endonucleases. It is. Table 1 (a) shows that the restriction enzymes have direct uses in the biotechnology industry. A wide array of enzymes such as elemental, ligase, gyrase and methylase are identified. 2. Antibody production As described herein, the proteins of the invention, as well as their homologs, A variety of proteins currently applied to other proteins and known in the art Can be used in procedures and methods. The protein of the present invention further comprises It can be used to generate antibodies that bind selectively. Such anti Antibodies can be either monoclonal or polyclonal, as well as these antibodies. It can be in the form of a fragment and a humanized form of the body You. The present invention further provides antibodies that selectively bind to one of the proteins of the present invention, A hybridoma producing the antibody is provided. Hybridomas are specific It is an immortalized cell line that can secrete ronal antibodies. Generally, it produces polyclonal and monoclonal antibodies and the desired antibodies. Techniques for making the resulting hybridomas are well known in the art (Campbell, A .M., Monononal Antibody Technology: Laboratory Technologies in Bio chemistry and Molecular Biology, Elsevier Science Publishers, Ora Amsterdam, the Netherlands (1984); St. Groth et al., J. Immunol. Methods35: 1-21 (1980); Kohler and Milstein, Nature256: 495-497 (1975)) Lioma technology, human B cell hybridoma technology (Kozbor et al., Immunology Toda) yFour: 72 (1983); Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc. (1985), 77-96). Animals known to produce antibodies (mouse, rabbit, etc.) It can be immunized with a peptide. Immunization techniques are well known in the art. this Such methods include subcutaneous or intraperitoneal injection of the polypeptide. Technical field One skilled in the art will recognize that the tandems encoded by the ORFs of the present invention used in immunization methods. Amount of protein varies based on animal to be immunized, antigenicity of peptide and site of injection You will be aware that Proteins used as immunogens are modified to increase the antigenicity of the protein. It can be corrected or administered in an adjuvant. Increase protein antigenicity Methods for determining antigens are well known in the art and include antigens and heterologous proteins. (Eg, globulin or β-galactosidase) or Including, but not limited to. For monoclonal antibodies, remove spleen cells from immunized animals and Fused with an eroma cell, such as SP2 / 0-Ag14 myeloma cell, and It can be a hybridoma cell producing a lonal antibody. The desired features can be obtained using any one of a number of methods well known in the art. A hybridoma cell producing an antibody having the following can be identified. These include ELISA, Western blot analysis or radioimmunoassay. Screening of hybridomas (Lutz et al., Exp. Cell Re. s.175: 109-124 (1988). Hybridomas secreting the desired antibody are cloned and known in the art. Class and subclass using the methods described (Campbell, A.M. , Monoclonal Antibody Technology: Laboratory Techniques in Biochem istry and Molecular Biology, Elsevier Science Publishers, Netherlands Country Amsterdam (1984)). The method described for producing single chain antibodies (U.S. Pat. It can be adapted to produce single chain antibodies to the light proteins. For polyclonal antibodies, from animals immunized with antisera containing the antibody The presence of an antibody having the desired specificity, isolated and using one of the methods described above Screen for The invention further provides the above-described antibodies in a labeled form for detection. Antibodies include radioisotopes, affinity labels (e.g., biotin, avidin, etc.), enzymes Labels (e.g., horseradish peroxidase, alkaline phosphatase Etc.), fluorescent labels (e.g., FITC or rhodamine, etc.), paramagnetic atoms, etc. Can be labeled for detection. The way to achieve such labeling is It is well known in the art and is described, for example, in (Sternberger, LA, et al., J. Histochem. . Cytochem.18: 315 (1970); Bayer, EA, et al., Meth. Enzym.62: 308 (197 Engval, E. et al., Immunol.109: 129 (1972); Goding, JW, J.M. I mmunol. Meth.13: 215 (1976)). The labeled antibody of the present invention expresses a fragment of the Rd genome of Haemophilus influenzae. In vitro, in vivo, and in situ assays to identify living cells or tissues Can be used with Say. The present invention further provides the above antibody immobilized on a solid support. Such a solid support Examples of carriers include plastics such as polycarbonate, agarose and cellulose. Complex carbohydrates such as fallose, acrylics such as polyacrylamide As well as latex beads. Techniques for binding antibodies to such solid supports Techniques are well known in the art (Weir, DM, et al., "Handbook of Experiment"). al Immunology, 4th Edition, Blackwell Scientific Publications, Ock, UK Sford, Chapter 10 (1986); Jacoby, WD, et al., Meth. Enzym.34 Academic Press, New York (1974)). The immobilized antibody of the present invention is a protein of the present invention. , In vivo and in situ assays and immunoaffinity purification of Can be used in law. 3. Diagnostic assays and kits The present invention further provides for the use of one of the DFs or antibodies of the present invention in a test sample. Methods are provided for identifying the expression of one of the ORFs of the invention or homologs thereof. In particular, such a method involves the testing of one or more antibodies or one or more antibodies. Incubate with one or more DFs of the invention, and DF or Assaying the binding of the antibody to a component in the test sample. Conditions for incubating DF or antibody with a test sample vary. Incu Conditions are the format used in the assay, the detection method used and the assay Depends on the type and nature of the DF or antibody used in the method. Those skilled in the art Any of the commonly available hybridization, amplification or immunoassay formats Can be readily adapted for use with the DFs or antibodies of the present invention. Will be. An example of such an assay is Chardity. (Chard, T.) (An Introduction to Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986)); Baroque G . R. (Bullock, GR), et al. (Techniques in Immunocytochemistry, Ac ademic Press, Orlando, Florida, Volume 1 (1982), Volume 2 (1983), Volume 3 (1 985)); (Tijssen, P.) (Practice and Theory of Enzyme Immunoassays: Laboratory Technologies in Biochemistry and Mole cular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985)) Can be. The test sample of the present invention includes cells, cell proteins or membrane extracts, or biological Target fluids, such as sputum, blood, serum, plasma or urine. Use with the above mentioned oil The test sample used depends on the assay format, the nature of the detection method, and the sample to be assayed. Will vary based on the tissue, cells or extracts used. Cellular protein extraction Methods for preparing exudates or membrane extracts are well known in the art, and the systems used Can be easily adapted to obtain samples compatible with In another embodiment of the present invention, the assays required to perform the assays of the present invention. Kits containing the drug are provided. In particular, the invention relates to (a) a first volume comprising one of the DFs or antibodies of the invention. And (b) one or more of the following reagents: washing reagent, bound DF or anti- One or more other containers containing reagents capable of detecting the presence of the body Compartment kit for receiving and enclosing one or more containers in close proximity I will provide a. In particular, compartment kits include any kit in which reagents are in separate containers. included. Such containers include small glass containers, plastic containers or Tick pieces or paper pieces are included. Such containers allow the sample and reagent to Transporting trials efficiently from one compartment to another so as not to be contaminated And dispensing reagents or solutions from each container from one compartment to another Can be added in a specific manner. Such containers have the capacity to receive test samples. Containers, containers containing antibodies used in the assay, washing reagents (eg, phosphate buffered saline) Container containing saline solution, Tris buffer, etc.) and detect bound antibody or DF A container containing the reagents used for The type of detection reagent includes a labeled nucleic acid probe, a labeled second antibody, Alternatively, if the first antibody is labeled, it may react with an enzyme or labeled antibody. Antibody binding reagents. Those skilled in the art will appreciate the DF disclosed in the present invention. And one of the established kit formats in which the antibody is well known in the art It will be readily recognized that it can be easily incorporated into 4. Screening assay for binding substances Using the isolated protein of the present invention, the present invention further provides one of the ORFs of the present invention. Thus, the encoded protein or the Haemophilus geno described herein. To obtain and identify substances that bind to one of the fragments . Specifically, the above method includes the following steps: (A) the substance is encoded by one of the ORFs of the invention Isolated proteins or isolated fragments of the genome of Haemophilus Contact with the client; and (B) the substance binds to the protein or the fragment; Measure whether they match. Substances screened in the above assays include peptides, carbohydrates, vitamins It can be, but is not limited to, a derivative or other medicament. these Substances can be randomly selected and screened or protein It can be reasonably selected or designed using modeling techniques. In a random screen, substances such as peptides, carbohydrates, Tamper selected randomly and they are encoded by the ORF of the present invention Assayed for their ability to bind to proteins. Alternatively, the materials can be rationally selected or designed. As used herein Sometimes, when a substance is selected based on a particular protein configuration, Choose or design rationally. " For example, those skilled in the art A possible method is to generate peptides, pharmaceuticals, etc. that can bind to specific peptide sequences. And rationally designed anti-peptide peptides (eg, Hurby et al., Application of Synthetio Peptides: Antisense Peptides, I n Synthetic Peptides, A User's Guide, WH Freeman, New York ( 1992), 288-307, and Kaspczak et al., Bio. chemistry28: 9230-9238 (1989)) or can produce pharmaceuticals etc. Wear. In addition to the foregoing, the present invention has been described using one of the broadly described classes of materials of the present invention. Gene expression can be controlled in conjunction with one of the light ORFs or EMF. As noted above, such substances may be screened at random or Can be designed / selected. By targeting the ORF or EMF, The skilled artisan will either rely on a single ORF that relies on the same EMF for expression control, or Design sequence-specific or element-specific substances that regulate the expression of either of multiple ORFs can do. One class of DNA binding substances is by binding to DNA or RNA. Contains base residues that form a hybrid or form a triple helix Substance. Such substances are the classic phosphodiesters, Diverse sulfides that can be based on thiophene or have base-binding ability It can be ril or a polymer derivative. Materials suitable for use in these methods typically have 20 to 40 bases, and And a gene region related to transcription (triple helix-Lee et al., Nucl. Acids Res.6: 3073 (1979); Cooney et al., Science241: 456 (1988); and Dervan et al., Scien ce251: 1360 (1991)) or the mRNA itself (antisense-Okano, J. . Neurochem.56: 560 (1991); Oligodeoxynucleotides as Antisense Inhi bitors of Gene Expression, CRC Press, Boca Raton, Florida (1988 Year)). Triple helix formation is the conversion of DNA to RNA Terminates transcription while antisense RNA hybridization forms a Blocks translation to the repeptide. Both technologies prove effective in model systems Has been stated. The information contained in the sequences of the invention may be antisense or triple helix. Necessary for designing oligonucleotides and other DNA binding agents. A substance that binds to a protein encoded by one of the ORFs of the present invention is By regulating the activity of the protein encoded by the ORF It can be used as a diagnostic agent when controlling bacterial infection. Book Known substances that bind to proteins encoded by one of the ORFs of the invention Physicians who use the technology to control the growth and infection of Haemophilus A medicinal composition can be manufactured. 5. Vaccines and pharmaceutical compositions The invention further provides for the growth of Haemophilus influenzae or another related organism in vivo or in vivo. Provide medicinal products that can be used to adjust in vitro. As used herein The "pharmaceutical" can be formulated using known techniques to provide a pharmaceutical composition. Defined as a composition. As used herein, "a pharmaceutical of the present invention" means Derived from the protein encoded by the ORY of the invention or Refers to a medicament that is a substance identified using the assays described herein. As used herein, a medicament may be a growth rate, division rate or survival of the organism in question. When reducing the toxicity, the medicinal product "inhibits the growth of Haemophilus sp. Or related organisms. Adjusted in vivo or in vitro ". Implement the use of the medicament of the present invention It is not necessary to understand the mechanism underlying the action, The article can regulate the growth of an organism in a number of ways. Important for some pharmaceuticals By binding to the protein, thereby blocking the biological activity of the protein Regulate growth while other drugs bind to components on the outer surface of the organism. To block adhesions or make organisms more accessible to multiple natural immune systems Sometimes. Alternatively, the medicament is encoded by one of the ORFs of the present invention Contains proteins and can serve as a vaccine. Outer membrane components, such as For example, the development and use of LPS-based vaccines is well known in the art. As used herein, a "related organism" is defined by one of the pharmaceutical agents of the present invention. Refers to any organism that can regulate growth. Generally, such organisms are called vaccines Contains homologues of proteins that are targets for drugs or proteins used as You. As such, the relevant organism need not be a bacterium, but a fungus or virus. Pathogens. The medicaments and pharmaceutical compositions of the present invention may be administered in conventional manner, e.g., orally, topically, intravenously. It can be administered by the intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal route. Medicine The composition is administered in an amount effective to treat and / or prevent a particular indication. one Generally, the pharmaceutical composition will be administered in an amount of at least about 10 μg / kg of body weight and In each case, it is administered in an amount not to exceed about 8 mg / kg of body weight per day. In most cases, The dose should be about 10 μg / kg of body weight per day, taking into account the administration route, symptoms, etc. Up to about 1 mg / kg body weight. The substances of the present invention can be used in their natural form or modified with chemical derivatives Can be formed. As used herein, a molecule is usually part of a molecule. This molecule has another chemical moiety when it has no additional chemical moieties. It is said to be a "conductor." Such moieties are responsible for molecular solubility, absorption, and biological halving. Period can be improved. These moieties alternatively reduce the toxicity of the molecule. May eliminate or attenuate unwanted side effects of the molecule. The part that can mediate such effects is Remington's Pharmaceutical Sciences. (Remington's Pharmaceutical Sciences) (1980). For example, the immunological characteristics of a functional derivative, such as altered affinity for a given antibody Activation is measured by a competitive type immunoassay. Appropriate changes in immunomodulatory activity Measured by a simple assay. Redox or thermal stability, biological half-life, Hydrophobic, susceptible to proteolysis or aggregate with carriers or multimer Modification of protein properties such as the tendency to aggregate is well known to those of ordinary skill. Assayed in known manner. The therapeutic effect of the medicament of the present invention may be determined by any suitable means (ie, inhalation, intravenous, intramuscular). , Subcutaneous, intestinal or non-light mouth) to provide medicinal products to patients Can be. To achieve an effective concentration in the blood or tissue in which the growth of the organism is to be controlled Preferably, the medicament of the present invention is administered. The preferred method to achieve effective blood concentrations is to administer the drug by injection Is Rukoto. Administration can be by continuous infusion or by single or multiple injections . When providing one of the pharmaceutical products of the present invention to a patient, the dosage of the pharmaceutical product to be administered is: The patient's age, weight, height, gender, general medical condition, previous medical history, etc. Will vary according to various factors. Generally, less or more administration Dose can be administered, but within the range of about 1 pg / kg to 10 mg / kg (patient weight). It is desirable to provide the recipient with a dosage of the pharmaceutical. A therapeutically effective dose is This can be reduced by using the medicament of the present invention or a combination of other medicinal products. When used in the present invention, either (1) the physiological effect of each compound or (2) each compound Two or more serum concentrations can be measured simultaneously More compounds or medicaments are said to be administered "in combination" with one another. Departure The clear composition should be administered at the same time as, before, or after the administration of other drugs. Can be. The medicament of the invention reduces the growth rate of the target organism (as identified above) It is intended to be provided to a recipient subject in an amount sufficient for Administration of the medicament (s) of the invention is either “prophylactic” or “therapeutic”. It can be aimed at fraud. When providing prophylactically, medicines (single or Is provided prior to any symptoms indicative of biological growth. Pharmaceutical (single or Prophylactic administration of multiple) helps prevent, attenuate or reduce the rate of subsequent infection. Treatment When offered as a product, the medicament (s) may be given at the onset of (or immediately Later). Therapeutic administration of the compound (s) may reduce the pathological symptoms of the infection. Helps weaken and increase recovery speed. The medicament of the present invention can be administered in a pharmaceutically acceptable form and at a therapeutically effective temperature. Administered to animals. If the composition is tolerated by the recipient patient, It is said to be "pharmaceutically acceptable." In such drugs, the dose administered is physiological If they are clinically significant, they are said to be administered in a "therapeutically effective amount." Pharmaceutical existence If the presence results in a detectable change in the physiology of the recipient patient, It is physically significant. The medicament of the present invention is formulated according to known methods for preparing pharmaceutically useful compositions. These materials or their functions depending on the method The active derivatives are combined in admixture with a pharmaceutically acceptable carrier medium. Other human Suitable vehicles and their formulations, including proteins such as human serum albumin For example, Remington's Pharmaceutical Sciences (Remington's Ph. armaceutical Sciences) (16th edition, Osol, A., Editing, Mack, Pennsylvania Easton, A. (1980)). Pharmaceutical permission suitable for effective administration Such compositions are effective in forming one or more effective amounts to form a tolerable composition. It will contain more of the medicament of the invention together with a suitable amount of carrier medium. Additional duration of action can be controlled using additional pharmaceutical methods. Controlled release formulation Complexes or absorbs one or more pharmaceutical agents of the present invention Can be achieved using a polymer. Controlling delivery is the right macromolecule (For example, polyester, polyamino acid, polyvinylpyrrolidone, ethylene vinyl Ruacetate, methylcellulose, carboxymethylcellulose or prota sulfate Min) and the method of introduction to control the temperature and release of macromolecules Can be implemented. Controls the duration of action by controlled release formulations Another conceivable method is to convert the pharmaceuticals of the invention into polyesters, polyamino acids, Poly such as drogel, poly (lactic acid) or ethylene vinyl acetate copolymer Into the particles of the mer material. Alternatively, these drugs can be Instead of being introduced into the cell, these materials are applied, for example, by coacervation technology or Microcapsules prepared by interfacial polymerization, e.g. Roxymethylcellulose or gelatin microcapsules and poly (methylmethac Related) in microcapsules or in colloid drug delivery systems such as Riboso , Albumin microspheres, microemulsions, nanoparticles and nanocapsules or Can be confined in a chroma emulsion. Such a technology is available at Remington's It is described in Pharmaceutical Sciences (1980). The present invention further relates to a pharmaceutical composition of the present invention comprising one or more components loaded with one or more components. A pharmaceutical pack or kit comprising one or more containers is provided. Such a content Related to the device (s) is the drug or biological product Notices in the form prescribed by governmental authorities that regulate their manufacture, use or sale; This notice also requires approval by manufacturing, use or marketing authorities for administration to humans. Reflects. In addition, the medicament of the present invention may be used in conjunction with other therapeutic compounds. Can be 6. Shotgun method for megapace DNA sequencing The present invention further provides for distribution of greater than 1 megabase using a random shotgun method. This is the first indication that the sequence of the columns can be determined. Detailed description in the following examples This method was used to determine whether overlapping or pre-sequencing The preliminary costs of isolating and sequencing consecutive subclones were eliminated. The following examples further illustrate certain features of the present invention, and are not intended to be limiting. Is not limited to this. Example Experimental design and method 1. Shotgun sequencing strategy The overall strategy of the shotgun method for whole genome sequencing is outlined in Table 3. You. The theory of shotgun sequencing is based on the Poisson distribution equation p_x= M^xe^-m/ x! ( Where x is the number of event occurrences, m is the average number of occurrences, and p_xIs a fixed amount The probability that a given base will not be sequenced after a random sequence has occurred) Lander and Waterman (Lander and Waterman, GenomicsTwo: 231 (1988)). L is the length of the genome, n is the number of cloned inserts sequenced and w is the sequencing read If the length is m = nw / L, then the clone is preceded by a fixed base. Probability of not occurring at any of the w bases to run, ie, the probability that the base will not be sequenced Is p₀= E^-mIt is. Using the folding range as the unit for m gives a distribution of 1.8 Mb. After random generation of the columns, it can be seen that m = 1, representing a 1 × range. In this case, p₀= E^-1= 0.37, so approximately 37% are not sequenced . For example, a 5X range (from both the insertion end and the average sequence read length of 460 bp) The sequence of approximately 9500 clones is determined)₀= E^-Five= 0.0067 or 0.67% Are determined. all The gap length of the body is Le^-mAnd the average gap size is L / n. 5X range The box is 128 bp on average, leaving 128 gaps. This process is a And Waterman (GenomicsTwo: 231 (1988)). Table 4 Indicates a range on a 1.9 Mb genome with an average fragment size of 460 bp. 2. Random library construction To estimate the random model described above during actual sequencing, Nearly ideal models of nome fragments are needed. How to build the next library The law was developed to achieve this. H. influenza Rd KW20 DNA was prepared by phenol extraction. 60 0 μg DNA, 300 mM sodium acetate, 10 mM Tris-HCl, 1 mM Na-EDT A, A mixture containing 30% glycerin (3.3 ml) was prepared using a 3 mm probe. Sonicated with the lowest energy set at 0 ° C. per minute (Branson Model 450 So nicator). The DNA was ethanol precipitated and redissolved in 500 μl of TE buffer. Was. To create blunt ends, add 100 μl aliquot to 5 units of BAL31 nucleus 10 minutes at 30 ° C. in 200 μl of BAL31 buffer with ze (New England BioLabs) Digested. DNA was extracted with phenol, precipitated with ethanol, and added to 100 μl of TE buffer Redissolve, run on a 1.0% low melting agarose gel, and run 1.6-2.0 k A fraction of size b is cut off, phenol extracted and TE buffer 20 Redissolved in μl. Using the two-step ligation method,> 99% are single-stranded inserts 9 A plasmid library with 7% insert was obtained. First binding mixture (5 0 μl) was 2 μg of the DNA fragment, SmaI / BAP pUC18 DNA (Pharmaci a) containing 2 μg and 10 units of T4 ligase (GIBCO / BRL) Incubation was 14 'for 4 hours. Phenol extraction and ethanol After precipitation, the DNA was dissolved in 20 μl of TE buffer and 1.0% low melting agarose gel. Le Electrophoresed above. Insert (i), vector (v), v + i, v + 2i, v + 3i,. . . Of the ethidium bromide-stained linear band identified as Visualize the dendrites with 360 nm UV light and cut out the v + iDNA, And collected in 20 μl of TE. v + iDNA is the v + i linear, 4 dNTPs each Contains 500 μM and 9 units of T4 polymerase (New England BioLabs) T4 polymerase treatment for 5 minutes at 37 ° C in the reaction mixture (50 µl) under the recommended buffer conditions To the end of the plant. Recovered after phenol extraction and ethanol precipitation The v + i linear was dissolved in 20 μl of TE. The final join to make a circle is v + i Performed overnight at 14 'in a 50 μl reaction containing 5 μl of linear and 5 units of T4 ligase . After 10 minutes at 70 ', the reaction mixture was stored at -20'. By this two-step method, the double insertion chimera (<1%) or the free vector (<3 %) To molecularly random collection of single-inserted plasmid recombinants with minimal contamination A collection was obtained. Deviations from randomness are most likely to occur during cloning E. coli host cells lacking all recombination and restriction functions (A. Greener , StrategiesThree(1): 5 (1990)). Prevented loan loss. Spread the transformed cells directly on the antibiotic diffusion plate, The normal broth recovery phase allows for the growth and selection of the fastest growing cells. Escaped. Seeding was carried out as follows: Epicurian coli SURE II supercompetent cells (Stratagene 200152) was thawed on ice and chilled Falcon (Falc) on ice on) Transferred to 2059 tubes. 1.7 μl of a 1.4 M β-mercaptoethanol fraction was added to the cells. A final concentration of 25 mM was added in addition to the others. Cells were incubated on ice for 10 minutes . Add 1 μl of the final conjugate fraction to the cells and incubate on ice for 30 minutes. I did it. Cells were pulse heated at 42 'for 30 seconds and returned on ice for 2 minutes. any To minimize the preferential growth of certain transformed cells, Growth periods in body culture were excluded from this protocol. Instead, transform The body was treated with SOB agar (1.5% SOB agar: 20 g of tryptone, 5 g of yeast extract, Na A nutrient-rich SOB containing a bottom layer of 0.5 g of Cl, 1.5% Difco Agar / L) Plated directly on the plate. 0.4 ml of Ambicillin (50 mg / ml) / S in 5 ml of bottom layer Replenish 100 ml of OB agar. X-Gal (2%) 1 ml, M on 15 ml of upper layer of SOB agar gCl_Two(1M) 1 ml and MgSO_FourReplenish 1 ml of / 100 ml of / SOB agar. Upper layer 15 ml was poured just before seeding. Our titer is approximately 100 colonies / transformation fractionation Was 10 μl. All colonies, regardless of size, were picked for mold production. From the library Only clones lost by "toxic" DNA or deleterious gene products are deleted, The number of gaps will increase only slightly more than expected. H. M13-21 primer to evaluate influenza library quality Was used to obtain sequence data from approximately 4000 templates. 1300, 1800, 2500, 32 After obtaining the 00 and 3800 sequence fragments, the random sequence fragments were: Auto Assembler (registered trademark) software (Perkin-Elmer (AB) A Assembled using pplied Biosystems department) and assembled specific salt The number of base pairs was determined. Based on the above formula, 2.5X10⁶And 1.9X10⁶bp genome As a function of the number of sequence fragments obtained with an average read length of 460 bp An ideal plot of the number of unsequenced base pairs was determined (FIG. 3). 38 Uses actual data from assemblies of up to 00 sequence fragments And plots the progress of the assembly, and is provided with an ideal plot And compare it with the existing data (Figure 3). Figure 3 shows that actual assembly data is ideal That the plot did not essentially deviate from the Ideal with no contamination by the input chimera and no vector Approximate to a random library It shows that it was built. 3. Random DNA sequencing High quality double-stranded DNA plasmid templates (19,687) Chick Technology Corporation (Gaithersburg, MD) Manufactured using a jointly developed "boiling bead" method (Adams et al., Sciencetwenty five Two : 1651 (1991); Adams et al., Nature.355: 632 (1992)). Preparation of plasmid Perform all DNA preparation steps from bacterial growth to final DNA purification in a 96-well format. gave. The template concentration was Hoechst Dye and Millipore cytofluo. (Millipore Cytofluor). Do not adjust DNA concentration However, where possible, low producing templates were identified and the sequence was not determined. The mold is 2 Also prepared from two H. influenza lambda genomic libraries. Amplification live The rally was constructed in the vector lambda GEM-12 (Promega) and unamplified Bullies were constructed in lambda DASH II (Stratagene). Especially non-amplified lambda For the library, H. influenza Rd KW20 DNA (> 100 kb) , 50 μg of DNA, 1 × Sau3AI buffer, reaction mixture containing 20 units of Sau3AI (200 μl) was partially digested in 23 ′ for 6 minutes. Digested DNA is extracted with phenol And electrophoresed on a 0.5% low melting agar gel at 2 V / cm for 7 hours. Fifteen Fragments of up to 25 kb were excised and recovered in a final volume of 6 μl . 1 μl of the fragment was used in the recommended ligation reaction with the DASHII vector ne) Used with 1 μl. Gigapack II XL packaging Package after recommended protocol with extract (Stratagene, # 227711) 1 μl of the binding mixture was used per zing reaction. The phage is mixed with the packaging The mixture was directly spread without amplification (dilution with 500 μl of recommended SM buffer and After chloroform treatment). Yield about 2.5 × 10^Three pfu / μl. Amplification library Is Essentially as described above, except that the lambda GEM-12 vector was used. Created. After packaging, about 3.5 × 10^Four Spread pfu to restricted NM539 host . Lysate is taken up in 2 ml of SM buffer and frozen in 7% dimethyl sulfoxide Saved. This phage titer is approximately 1 × 10⁹pfu / ml. Liquid lysates (10 ml) were prepared from randomly selected plaques and the template was Prepared on an anion exchange resin (Qiagen). The sequencing reaction is M13 forward (M1 3-21) and M13 reverse (M13RP1) primers (Adams et al., Nature368: 474 ( Applied Biosystems PRISM Ready Reaction for 1994)) Dye Primer Cycle Sequencing Kit (Ready Reaction Dye AB Catalyst Love Station (Lab) using Primer Cycle Sequencing Kits Station) on the plasmid template. Dye determinator arrangement The constant reaction is Applied Biosystems Ready Reaction Die Termine. Perkin-Elmer 9600 using the Data Cycle Sequencing Kit Performed on a lambda mold in a thermocycler. Using T7 and SP6 primers Sequence the ends of the insert obtained from the Lambda GEM-12 library, and Obtained from lambda DASH II library using T7 and T3 primers The ends of the insert were sequenced. AB373 DNA sequencer per day Three months of sequencing reactions (28,643) were performed by eight people using an average of 14 pieces. Sequencing All reactions use the stretch modification of AB 373 and are primarily 34 cm Analyzed using reading distance. Overall sequencing success rate for M13-21 sequences 84% for the M13RP1 sequence and 83% for the dye terminator reaction. Was 65%. Average usable read length is 485 bp for M13-21 sequence And 444 bp for the M13RP1 sequence and the dye terminator reaction Was 375 bp. Table 5 summarizes the high throughput sequencing phases of the present invention. Things. Richards, et al. (Richards, et al., Automated DNA sequencing a nd Analysis, MD. Adams, C.I. Fields, J.C. Venter, editing (Academic Press, London, 1994), Chapter 28), is a show of lambda and cosmid clones. The ends of a sequencing template to facilitate serial alignment in the Tugun assembly project It describes the value of using sequences obtained from the ends. We are M13-21 (previous Orientation) In a sequencing reaction performed with M13RP1 (reverse orientation) compared to the primer. Desirability of double-ended sequencing for shorter read lengths (due to reduced template counts) Including cost reductions). Approximately half of the template is sequenced from both ends Was. In total, 9,297 M13RP1 sequencing reactions were performed. Random reverse alignment Fixed reactions were performed based on successful forward sequencing reactions. Some M13RP One sequence was obtained in a semi-oriented manner: M13-21 pointing outward at the end of the sequence The sequences were specifically selected for M13RP1 sequencing in an effort to align the serials. Half way An assembly strategy is effective, and clone-based alignment is An integral part of the closure was formed (see below). 4. Protocol for automated cycle sequencing Sequencing was performed with eight ABI Catalyst robots and fourteen AB373. Consisted of using an automatic DNA sequencer. Catalyst robot Is an elaborate, publicly available pipette specially developed for DNA sequencing reactions And temperature control robot. This catalyst is separated from the pre-sorted template, Synonucleotides and dideoxynucleotides, Taq thermostable DNA polymerase Combines a reaction mixture consisting of a fluorescently labeled sequencing primer and a reaction buffer ing. The reaction mixture and mold were transferred to a 96-well aluminum heat cycle plate. Combined together within the le. Denaturation, annealing of primer and template and DNA Linear amplification including synthetic elongation Thirty consecutive cycle steps (eg, one primer synthesis) were performed. Heat circulation pre Heated lid with rubber gasket on plate prevents evaporation and does not require oil overcoating Was. Two sequencing protocols were used: dye labeled primer and dye labeled dide. Oxy chain terminator. Shotgun sequencing has four terminators -Four dye-labeled sequencing primers, one for each of the nucleotides Is related to the use of Each dye-primer is labeled with a different fluorescent dye And four individual reactions are 373 DNA sequences for electrophoresis, detection and base calling. Can be combined in one lane of the sequencer. AB is currently The mixed reaction mixture is combined with a bulk reagent containing all non-template reagents required for sequencing. Supplied in packages. Generally longer usable due to plasmid template Sequence is obtained, but sequencing can be performed with dye-platin Performed using both PCR generated templates with both imers and dye-terminators Can be applied. 32 reactions per 373 sequencer were performed daily for a total of 960 samples . Electrophoresis was performed at night according to the manufacturer's protocol, and data was collected at 12:00 Gathered for a while. After electrophoresis and fluorescence detection, the AB373 provides automatic lane tracking and Perform base calling. Lane tracking was confirmed visually. D of each array Lectroferogram (or fluorescent lane trace) is visually inspected and quality Was evaluated. Track and retrieve poor quality sequences, and soft Into the Sybase database by software (stored daily on 8 mm tape). The reading vector polylinker sequence is automatically retrieved by a software program. Removed. The average sequence length compiled from standard ABI 373 is around 400 bp and It was largely dependent on the quality of the template used for the sequencing reaction. ABI 373 All sequencers are converted to Stretch Liners, And this provides a longer electrophoresis path before fluorescence detection, and as a result The average number of available bases has increased to 500-600 bp. Information science 1. Data management Numerous information management systems for large-scale sequencing laboratories (labs) have been developed ( Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii Intern ational Conference on System Sciences, IEEE Computer Society Press, Washington DC,585(1993)). Collect and assemble sequence data This system used to use Sybase related data management system Developed and minimized user errors by automating data flow as much as possible Designed to let. The database covers from template preparation to final analysis of the genome. All information gathered during all operations is stored and correlated. AB37 The raw output of the three sequencers is based on the Macintosh platform And the selected data management system is a Unix platform Based on raw data and analysis results with minimal user effort. A large variety of users and client servers that can flow seamlessly through the base It was necessary to design and run the application. Large array pairs A description of the software program used for preparation and management is provided in FIG. 2. assembly Assembly engine (TIGR Assenbler) rapidly converts thousands of sequence fragments Developed for accurate assembly. AB Auto Assembler (trademark) Of data related to aligned sequence file output of TIGR assembler To provide a graphical interface to electropherograms for collection purposes Modified as (And called TIGR Editor). TIGR editor (editor) Electropherogram files and Unic for the Macintosh platform Synchronization between sequence data in the Splatform Haemophilus influenzae database Has been maintained. TIGR assembler collects and assembles fragments of the genome simultaneously . Ten^FourTo get the speed needed to assemble more than fragments, The algorithm creates a compilation table of 10 bp oligonucleotide sequences and Created a segment overlap table. The possible duplication of each fragment is Determine if the fragment appears to be in a repeating element. Single seed array fragment Starting with the TIGR assembler, the TIGR assembler is based on the oligonucleotide content. Extend the latest continuum by trying to add the best matching fragment You. The most recent sequence and candidate fragments are available in Smith for optimal gap alignment. ・ Waterman (Smith-Waterman) algorithm (Waterman, MS, Me thods in Enzymology164: 765 (1988)). Most New continuum should be flagged only if strict criteria for the quality of Extended by the client. Compliance criteria include minimum overlap length and maximum non-conformance Includes edge length and minimum fit percentage. These criteria cover a minimal area Automatically degrades by the algorithm and has a possible repetition factor It rises in the area. The possible duplication of each fragment is Determine if the event appears to be in a repeating element. With repeating elements and potentially chimeras Fragments that represent the boundaries of a fragment are often partially mismatched at the aligned ends. Rejected based on fit and removed from the latest continuation. TIGR assembler Uses clone size information along with sequencing from both ends of each template It is designed to be. This allows the arrangement to be obtained from the two ends of the same template. Row fragments are facing each other in the sequence and have a constant base pairing Columns (for a given library, based on known clone size range (Identifiable in each clone). Flu Assembly of 24,304 sequence fragments of Enza has 512 Mb of RAM 30 hours of CPU time using a single SPARCenter 2000 processor Was needed. Approximately 210 serials were obtained in this way. TIGR assemble Due to the high rigor of Ra, the grasta (modified fasta (Person and Lipman, Pr oc. Natl. Acad. Sci. U.S.A.85: 2444 (1988)) The successors were searched for each other. In this way, the data set in 140 continuum Duplicates were also detected that allowed data summarization. Location and location of each fragment in the sequence And extensive information on the consensus sequence itself I put it in 3. Alignment of assembled continuum After assembly, the relative positions of the 140 serials were not known. these Serials were aligned by asm align. asm align is used to identify adjacent Use multiple relationships to define and align. Using this algorithm And put 140 serials into 42 groups, for a total of 42 physical gaps (of this region) There is no template DNA and 98 sequence gaps (the template can be used to fill the gap) ). Align and fill gaps separated by physical gaps Developed four integrated strategies to align serial gaps separated by physical gaps did. Design oligonucleotide primers and synthesize from the end of each serial group did. Therefore, these primers may be used in one or more of the strategies outlined below. Use to use with Was able to: 1. A unique "finger" for 72 subsets of the above oligonucleotides Southern analysis was performed to generate "prints". This method uses Labeled oligonucleotides homologous to the terminus can be used to hybridize To form a lid and consequently a similar or identical hybridizing pattern. It was based on the assumption of sharing turns or "fingerprints". Ori The gonucleotides were 50 pmol each of the 20 mer and 250 mCi of [γ-³²P] ATP and T4 Labeled using polynucleotide kinase. The labeled oligonucleotide is And purified using Sephadex G-25 Superfine (Pharmacia). One frequently used cutter (AseI) and five infrequently used cutters Influenza digested with a cutter (BglII, EcoRI, PstI, XbaI and PvuII) In the Southern hybrid analysis of Rd chromosomal DNA of Lactobacillus casei, 107 cpm oligonucleotide Otide was used. DNA obtained from each digest is fractionated on 0.7% agarose gel And Nytran Plus nylon membrane (Schleicher & Schu) ell). Hybridization was performed at 40 'for 16 hours. Non-specific signals To remove, each plot was taken at room temperature and the stringency condition was set to 0.1 × SSC + 0.5% SDS. 洗浄高め高め. Convert the plot to a phosphor imager (PhosphorIm ager) Expose to cassettes (Molecular Dynamics) for several hours and hybridize The formation patterns were compared visually. The adjacent contigs identified in this way were targeted for a specific PCR reaction. 2. Peptide bonds are blastx (Altschul et al., J. Mol. Biol.215: 403 (1990)). If the ends of the two sequences matched the same database sequence in an appropriate way Tentatively considered that these two continuum were adjacent to the difference. 3. Two lambda libraries constructed from Haemophilus influenzae genomic DNA Was searched with oligonucleotides designed from the end of the series (Kirkness et al.). , GenomicsTen: 985 (1991)). Next, create a template using positive plaque , And the sequence was determined from each end of the lambda clone insert. These array flags Were searched using grasta against a database of all continuum. same Align two sequences that match the sequences obtained from the opposite ends of the same lambda clone I let you. The lambda clone then fills in sequence gaps between adjacent rapid successions. A template for providing Lambda clones are particularly valuable for decoding repetitive structures There is. 4. Check the alignment of the continuum found by other methods and Standard and long-range (XL) PCR reactions to establish The reaction was performed as follows. Standard PCR was performed as follows. Each reaction was 37 μl of the mixture; H_TwoO 16. 5 μl, 25 mM MgCl_Two 3 μl, dNTP mix (1.25 mM each dNTP) 8 μl 4.5 μl of I0X PCR core buffer II (Perkin Elmer), Rd of Haemophilus influenzae It contained 25 ng of KW20 genomic DNA. Two appropriate primers (4 μl, 3.2 p mole / μl) was added to each reaction. Hot start at 95 for 5 minutes followed by 75 ' It was carried out. While maintaining, H_TwoO 4.3μl, 10X PCR core buffer II Amplitaq DNA polymerase (Perkin Elmer) 0 in 0.5 μl 0.3 μl was added to each reaction. PCR profile: 94 '/ 45 sec denaturation; 55' / 1 min Annealing; 25 cycles of 72 '/ 3 min extension. All reactions are par Performed in a 96-well format on a Kin Elmer GeneAmp PCR System 9600. Long range PCR (XL PCR) was performed as follows: Each reaction was mixed 35.2 μl; H_TwoO 12.0 μl, 25 mM Mg (OAc)_Two 2.2 μl, dNTP mix (20 0 μM final concentration) 4 μl, 12.0 μl of 3.3 × PCR buffer, Haemophilus influenzae Rd Contains 25ng of KW20 genomic DNA Was out. Two appropriate primers (5 μl, 3.2 pmole / μl) were added to each reaction. The hot start was performed at 94 'for 1 minute. For each reaction, 3.3X PCR buffer II 2. 2.0 μl of rTth polymerase (4 U / reaction) in 8 μl was added. PCR profile 18 cycles of annealing and extension of 62 '/ 8 minutes, 94' / 15 second denaturation; Followed by 94 '/ 15 sec denaturation; 62' / 8 min (15 sec increase per cycle) annealing 12 cycles of brushing and extension; 72 '/ 10 minutes final extension. All reactions are -Kin Elmer GeneAmp PCR system 9600 performed in 96 well format. PCR reactions were performed essentially for each physical gap end combination, but adjacent to each other. Combination needed to align tangent continuum and completely fill gaps To reduce the number of PCR reactions, Southern fingerprinting, data Techniques such as base adaptation and large insert clone search were of particular value. Using these strategies on a larger scale for future genomics projects will not be fully genomic. The overall effectiveness of filling the system will be increased. Aligned by each of these techniques The number of physical gaps caused and filled is summarized in Table 5. Sequence information from the 15-20 kb clone ends fills in gaps and Particularly suitable for unraveling and providing general information on overall genome assembly I have. We also found that some fragments of the H. influenzae genome It was also of interest that some of the high copy plasmids could not be cloned. We considered that soluble lambda clones provided the DNA for these fragments. amplification Lambda library, prepared template and sequence information obtained from each end Approximately 100 random plaques were collected. These sequences are searched for continuum (gr asta), and combine with those appropriate continuations in the database, and the result Lambda clone scaffolds further contribute to the accuracy of genome assembly Provided (Figure 5). In addition to confirming the continuum structure, lambda clones closed 23 physical gaps . Approximately 78% of the genome was covered by lambda clones. Lambda clones were also useful for elucidating repetitive structures. Antigens identified in the genome The conformation is composed of six ribosomal RNA operons and one repeat of 5,340 bp in length (2 (Copy), but with a single clone from a random insert library It was small enough to be able to. Oligonucleotide probes for each iteration Design from the beginning distinctive side and form a hybrid with lambda library I let it. Positive plaques were identified for each side and obtained from the ends of each clone. Sequence fragments were used to correctly direct repeats in the genome. The six ribosomal RNA (rRNA) operons of Haemophilus influenzae (16S sub Unit-23S subunit-5S subunit) can be identified and assembled Whether it can be a complex one that seems to have a significant number of repetitive regions Testing our overall strategy for sequencing and assembling the genome Was. Due to the high degree of sequence similarity and the length of the six operons, the entire underlying An assembly method has been obtained which collects all sequences into a few unidentified continuations. In an array A pair of unique sides for each to determine the correct placement of the operon at A partial arrangement was required. No unique side sequence at left (16S rRNA) end I couldn't do that. This region has a ribosomal promoter, and It did not seem possible to clone in the high copy number pUC18 plasmid. However However, a unique sequence could be identified at the right (5S) end. Oligonucleotide Primers are designed from these six side regions and two lambda libraries Used to suggest Lee. For each of the six rRNA operons, At least completely fill in the peron and have unique side sequences at the 16S and 5S ends Also one positive plaque It has been certified. These plaques were used as templates to obtain unique sequences from six rRNA operators. Provided to each of them. Further arrangements of the overall structure of the assembled circular genome include the enzymes ApaI, SmaI and R A computer generated restriction map based on the sequence assembled for srII Redfield and Lee (Genetic Maps: locus maps of compl ex genomes, S.J. Edited by O'Brien, Cold Spring Harbor Laboratory Pre ss, New York, NY, 1990, 2110) It was obtained by doing. Restriction fragments of maps derived from sequences are physical The size and relative order were consistent with those obtained from the target map (FIG. 5). Edit AB Auto Assembler (registered trademark) and First Data Finder ( Fast Data Finder^fM) Assemble overlapping 10 kb fragment of serial using hardware Visually edited by standing up. Auto Assembler (registered trademark) Provides a graphic interface to electropherogram data for collection You. Use electropherogram data to determine the most likely base at each position Assigned. If the difference could not be resolved or a clear assignment could not be made In this case, the automatic base call was not changed. Changes in individual sequences are electropherograms Write to a ram file and use the replication protocol (crash) Simultaneous sequence data between the Enza database and electropherogram files Gender (identity?) Was maintained. After editing, before annotating, TIGR Assembler. The assembly was reassembled using. Potential frameshifts identified during the annotation of the genome are Omitted as a report. These reports include alignment software (praze) Is the most likely city of lost or inserted bases Display of sequence alignments, including combinations of sequences that are predicted to be I will. Obvious frame shifts can occur in sequences that may need further editing. Used to indicate area. Frameshift is an obvious electropherogram No correction was made if the data did not match the frame shift. Frame shift Was edited using a TIGR editor (Editor). rRNA and other repetitive regions are used to complete the circular genome by TIGR assembler. Hindered assembly. The final assembly of the genome requires multiple runs based on short duplications. Achieved using comb asm splicing succession together. Genome sequence accuracy The accuracy of the H. influenzae genomic sequence is Because the fungus sequences are very low and most of these sequences are from other species, Difficult to quantify. However, there are three The accuracy parameter exists. First, based on database similarity , Predicted number of apparent frameshifts of Haemophilus influenzae genes is 148 . Some of these apparent frame shifts are, in particular, 49 explicit frame shifts. Is based on a fit to a hypothetical protein from another organism Given that, in our database sequence rather than in our sequence It seems to be. Second, there are 188 bases in the genome that remain ambiguous as N There is (1 / 9,735bp). Combine these two types of "known" errors We can calculate a maximum sequence accuracy of 99.98%. Average range is 6 .5X, and 1% of the genome is 1-fold in range. Gene identification Predict all coding regions of the Haemophilus influenzae Rd genome; and Genes, tRNAs and rRNAs and other features of DNA sequences (eg, repeats, signatures) Nodal sites, replication, origin sites, nucleotide composition) were attempted. easily Clarity is provided below for some descriptions of sequence features. The Haemophilus influenzae Rd genome is a circular chromosome of 1,830,121 bp. Overall G / The content of C nucleotides is approximately 38% (A = 31%, C = 19%, G = 19%, T = 31% , IUB = 0.035%). Genomic G / C content looks for global structural features In order to do this we looked at several length windows. In a 5,000 bp window, G / C Is relatively uniform except for the 7-large G / C-rich region and the A / T-rich region. And so on (FIG. 5). The G / C-rich region consists of six rRNA operons and a mysterious Corresponds to the position of the view-like prophage. Co by bacteriophage mu The genes for some proteins that have similarities to the protein being It is located between 1.56 and 1.59 Mbp. This region of the genome is Have a significantly higher G / C content than the average (about 38% of the rest of the genome) About 50% G / C). Yes, regarding the cause or significance of A / T rich areas The intent has not yet been confirmed. The minimal replication origin (oriC) of E. coli contains one terminal GATC core sequence 9 containing three copies of the 13 base pair repeat and the other terminal TIAT core sequence. A 245 bp region identified by four copies of a base pair repeat. GATC section The position is a methylation target and controlled replication, while the TIAT site defines the binding site for DnaA. Provided, and is the first step in the replication process (edited by Genes V, B. Lewin (Oxford University Press, New York, 1994), Chapters 18-19). The limits are these same The approximately 281 bp sequence (602,483 to 602,764) specified by the core sequence It appears that fluenza Rd specifies the origin of replication. These Cody Nate is a combination of ribosome operons rrnF, rrnE, rrnD and rrnA, rrnB, rrn Exists between C. Re These two groups of bososome operons are transcribed in opposite directions and the origin substitution is Consistent with the polarity of these transcripts. Termination of E. coli replication involves two replication forks Represented by two 23 bp termination sequences located approximately 100 bp on either side of the midpoint. You. Two potential termination sequences that share a 10 bp core sequence with the E. coli termination sequence are Coordinates for Ruenza bacteria are 1,375,949 to 1,375,958 and 1,558,759 to 1,558,768 Was identified. The coordination of these two sets is the proposed influenza It covers approximately 100 kb from the 180 'opposite of the bacterial origin of replication. Six rRNA operons were identified. Each rRNA operon has three rRNAs Contains subunits and various spacer regions in the following order: 16S subunit Knit-Spacer area-23S subunit-5S subunit. these Are 1539 bp, 2653 bp and 116 bp, respectively. These three G / C content (50%) of ribosome subunits is higher than that of genome . The G / C content of the spacer region (38%) is consistent with the rest of the genome. Three r The nucleotide sequence of the RNA subunit is 10 for all six ribosomal operons. 0% identical. The rRNA operon is a spacer region between the 16S and 23S sequences. Can be classified into two classes. Of the two spacer regions The shorter one is 478 bp in length (rrnB, rrnE and rrnF) and tRNA Contains the Glu gene. The longer spacer is 723 bp in length (rrnA, rrnC and rrnD) and contains the tRNA Ile and tRNA Ala genes . The set of these two spacer regions is also 100% identical for each group of three operons. . tRNA genes are also present at the 16S and 5S ends of the two tRNA operons doing. The tRNA Arg, tRNA His and tRNA Pro genes are rrnE The tRNA Trp and tRNA ASP genes are located at the 16S Located at the 5S end of A. The predicted coding region of the H. influenzae genome was initially Codon frequency matrix derived from 122 Haemophilus influenzae coding sequences Using GeneGenemark (Borodovsky and McIninch, Computers Chem.17(2): 123 (1993)). Predicted Sequence region (plus 300 bp side sequence) was specially created for annotation Used in searching a database of non-redundant bacterial proteins (NRBP). DNA All coding regions are drawn from Genebank (Release 85) and are of the same species The resulting sequences were searched for each other. > 97 over a region of> 100 nucleotides Sequences with% similarity were combined. In addition, the sequence is translated, and Switzerland Used for protein comparison with all sequences (Release 30) in the plot (Swiss-Prot) Used. Species belonging to the same species and having> 98% similarity over 33 amino acids Combined columns. NRBP consists of 23,751 genes from 1,099 different species 21,445 arrays derived from bank array and 11,183 Swiss plot arrays It is composed of A total of 1,749 constitutive predictive coding regions were identified. Influenza bacteria prediction The search for the loaded region is performed by reading three additional strands (strands) to search for NRBP. Translates the interrogated DNA sequence in the open reading frame and matches the protein sequence that matches the interrogated sequence. And use praz to align protein-protein matches. The Gorism, Modified Smith-Wateroman (Pearson And Lipman, Proc. Natl. Acad. Sci. U.S.A.85: 2444 (1988)) Al Implemented using the algorithm. Flame shrinkage by insertion or deletion in the DNA sequence If a shift error occurs, the alignment algorithm starts in the region of maximum similarity, And aligned to the same database fit in another frame using the 300 bp side region Was extended. Areas known to contain frame shift errors It was omitted from the database and evaluated for possible modifications. Unidentified prediction The coding region and the rest of the intergenic sequences are as shown in Swiss Plot, PIR and Geneba. A data set of all peptide sequences available from the company was searched. Operon structure Structure identification is facilitated by experimental measurements of transcription promoters and termination sites. Would. Each inferred Haemophilus influenzae gene was identified by Riley (Riley, M., Microbiology Reviews57(4): 862 (1993)) Assigned to one of the logical role categories. Allocation is based on the prediction coding region Perform the protein sequence in relation to the Swiss plot sequence in the relay database. Was. Of the 1,749 predicted coding regions, 724 have not been assigned a role. these Of these, 384 did not find a database match, while 340 Conformed to the “hypothetical protein” of the authors. Role assignment is 1,02 in the predicted code area Made in five. Editing of predicted coding regions, their unique identifiers, 3-letter genes The identifier, percent identity, percent similarity and amino acid match length are reported in Table 1 ( a). Complete genome maps annotated with Haemophilus influenzae Rd are shown in FIGS. ). This map places each predicted coding region on the H. influenzae chromosome. , Indicates its transfer direction, and the color is encrypting its role assignment. Role assignment The assignment is also shown in FIG. Investigation of the gene and gene chromosome mechanism of Haemophilus influenzae Rd The explanation becomes possible. Haemophilus influenzae survives as a free living organism, Nutritional requirements for growth in the laboratory and especially its pathogenicity and virulence It needs features that make it different from other organisms as related. Genome is life With the complete complement of a class of genes known to be essential Would be expected. For example, the potential phase of the Haemophilus influenzae database The published protein sequences of the homolog and the E. coli ribosome correspond one-to-one. Similarly, as shown in Table 1 (a), aminoacyl-tRNA synthetase Amino acids exist in the genome Are there. Finally, the location of the tRNA gene was mapped on the genome. Typical There are 54 identified tRNA genes including 20 amino acids. Haemophilus influenzae is fermented and / or electronized to survive as a free living organism Energy must be generated in the form of ATP by transport. Accidental dislike As an aerobic organism, Rd of Haemophilus influenzae is glucose, fructose, galacto. It is known to ferment glucose, ribose, xylose and fucose (Doro cicz et al. Bacteriol.175: 7142 (1993)). Genes identified in Table 1 (a) Is a phosphoenolpyruvate-phosphotransferase system (PTS) and Demonstrating that a transport system can be used for the uptake of these saccharides by the PTS mechanism I have. PTS-based common phosphate carrier enzymes I and Hpr (ptsI and ptsH) The specific gene as well as the glucose-specific crr gene were identified. ptsI, ptsH And the crr gene constitutes the pts operon. However, we have membrane bound glucose No specific enzyme II has been identified. The latter enzyme is necessary for glucose transport in the PTS system. It is important. A complete PTS line of fructose was identified. Genes encoding a complete glycolysis system and production of fermentation end products have been identified. Increase Reproductive anaerobic respiration mechanisms include nitrate, nitrite and dimethyl sulfoxide. Genes encoding functional electron transport systems using such inorganic electron acceptors Was found by determining Three enzymes of the tricarboxylic acid (TCA) cycle Does not appear to be in the genome. Citrate synthase , Isocitrate dehydrogenase and accordase (acordtase? Acid hydratase) either searches for the predicted coding region or reverses the E. coli enzyme. Not found when used as peptide query for whole genome in translation . This corresponds to the very high glutamate values required in the specified culture medium (1 g / L) (Klein and Luginbuhl, J. Gen. Microbiol.113 : 409 (1979). Glutamate is exchanged for glutamate dehydrogenase. By converting it to faker glutarate, it can be directed to the TCA circuit. You. In the absence of a complete TCA cycle, glutamate probably diverges from the TCA cycle Serving as a carbon source for amino acid biosynthesis by using a precursor One. Functional electron transport systems produce ATP using oxygen as the final electron acceptor Can be used for Unanswered questions about pathogenicity and virulence are adhesion and adhesion. By testing for certain classes of genes, such as the lipooligosaccharide-generating genes, Can be handled. Moxon and co-workers (Weiser et al., Cell)Five 9 : 657 (1987)) is a tetramer repeat in which these virulence-related genes are arranged vertically. And this repeat is frequently repeated during replication by one or more repeat units. Additional additions and deletions, resulting in an altered reading frame for this gene and Has evidence that its expression has been altered. Now the complete genome The array is used to determine the location of all such array areas (FIG. 5). To determine their role in the phase variation of such potential virulence genes It is possible to start. Haemophilus influenzae Rd has a very efficient natural DNA transformation system (Kahn and Smith, J. Membrane Biol.138: 155 (1984). Unique DNA collection Sequence site, 5 'AAGTGCGG present in multiple copies in the genome T has been shown to be required for efficient DNA uptake. Now all of these It is possible to determine the location of all sites and to fully describe their distribution. Fifteen genes involved in transformation have been described and sequenced. (Redfield, R., J. Bacteriol.173: 5612 (1991); Chandler, M .; Proc. Natl. Acad. Sci. U.S.A.89: 1616 (1992); Barouki and Smi th, J. Bacteriol.163(2) : 629 (1985); Tomb et al., Gene104: 1 (1991); Tomb, J., Proc. Natl. Acad. Sci. U.S.A.89: 10252 (1992)). 6 genes, comA to comF Is a 22 bp balindrome receptor around one helical turn upstream of the promoter. Including operons that are explicitly (?) Controlled by the capacity regulating element (CRE) I have. The rec-2 transforming gene is also controlled by this element. Now, the genome The location of additional copies of the CRE within and potential transformation under CRE control It is possible to discover genes. In addition, a wide range of other adjustment elements are now easily It is possible to discover, which was not possible before. One well-described gene regulatory system in bacteria is that certain environmental signals Molecule that detects DNA and a regulatory molecule that is phosphorylated in the activated form of the sensor Is a "two-component" system composed of Regulatory proteins are generally Therefore, a transcription factor that starts or stops a specific gene set when activated ( For a review, see Albright et al., Ann. Rev. Genet.twenty three; 311 (1989); Parkin son and Kofoid, Ann. Rev. Genet.26: 71 (1992)). E. coli is 40 cents It is believed to have a sir-regulatory pair (Albright et al., Ann. Rev. Gene). t.twenty three: 311 (1989); Parkinson and Kofoid, Ann. Rev. Genet.26: 71 (1992 Year)). The Haemophilus influenzae genome was analyzed using tblastn and tfasta, Using the protein, we searched from sensor and regulatory protein families. Four sensors And five regulatory proteins have been identified and are similar to proteins from other species. (Table 6). Except for CpxR, there is a corresponding sensor for each regulatory protein. Seems to be. By searching for CpxA protein obtained from E. coli, Thus, three of the four sensors shown in Table 6 were identified, with additional significant No fit was seen. The level of sequence similarity is sufficient to be undetectable by tfasta May be low. Representative of NtrC class regulatory proteins Was not seen at all. This class of proteins is a signal of RNA polymerase Interacts directly with the subunit 54, which is present in H. influenzae Not. All regulatory proteins fall into the OmpR subclass (Albright et al., Ann . Rev. Genet.twenty three: 311 (1989); Parkinson and Kofoid, Ann. Rev. Gene t.26: 71 (1992)). Haemophilus influenzae phoBR and basRS genes are next to each other Are in contact and possibly form an operon. The nar and arc genes are adjacent to each other Not located. The questions of most interest that can be answered by a complete genome sequence are: It is about missing genes or pathways. Nonpathogenic influenza R The d strain is significantly different from the pathogenic serotype b strain. Many differences between these two strains Or appear to be factors affecting infectivity. For example, bacteria and host cells Gene group with cilia involved in adhesion with (Mol. Microbiol.13: 673 (1994)) is now absent in the Rd strain. It is shown. Influenza type b strains have ciliated populations on their sides The pepN and purE genes are adjacent to each other in the Rd strain (FIG. 7), and This suggests that the entire group (duster → cluster) has been clipped. Than At a broad level, a nonredundant set of protein-encoding genes from E. coli Ie, the Genome Project at the University of Wisconsin Genebank. ject) Serial: Genebank acceptance D10483, L10328, U00006, U00039, U140 03 and U118997, 1,216 predicted protein sequences (Yura et al., Nucleic Acids Research20: 3305 (1992); Burland et al., Genomics.16: 551 (1993 Utilizing)), we can see which proteins in E. coli are present in H. influenzae Not decided. The minimum threshold of a match is scored as positive, even for weak matches, And thereby the minimum estimate of E. coli genes not found in H. influenzae Gave. E. coli protein using tblastn Each of the qualities was searched against the complete genome. A blast score> 100 was considered a match. Combination A total of 627 E. coli proteins represent at least one region of the H. influenzae genome And 589 proteins did not match. 589 non-conforming proteins Quality and that they contain an unbalanced number of hypothetical E. coli proteins. I understood. 68% of E. coli proteins identified Matched by sequence, while only 38% of the hypothetical protein matched. Ta Protein is hypothetical based on its incompatibility with any other known protein. Annotate as protein (Yura et al., Nucleic Acids Research)20: 3395 ( 1992); Burland et al., Genomics16: 551 (1993)). Incompatible proteins There are at least two possibilities for over-presentation of hypothetical proteins. Can provide a compelling explanation: hypothetical proteins are actually translated Not (at least in the annotated frames) these are E. coli-specific Protein, and most closely related species except for Salmonella typhimurium It doesn't seem to be found in any species. A total of 384 predicted coding regions are available with 6-frame translations of GeneBank Release 87 Did not show similarity. These unidentified coding regions encode fasta And compared with each other. Several new gene clusters have been identified. For example, data The two predicted coding regions (HI0591, HI0852) that do not match the base Nearly full length (139 and 143 amino acid residues, respectively) shares 75% identity Have. These areas are similar to each other but are available in the current database The incompatibility with any protein indicates that they may represent a novel cellular function. And suggests. Unidentified coding regions may have no significant amino acid identity. Membrane potential often retained between receptor members and transport genes Shows the pattern of the panning domain Other types of analysis can be applied, including hydrotherapy. Membrane binding Chan Potential transmembrane with a periodic pattern characteristic of Nenru protein Figure 5 shows five examples of unidentified predicted coding regions showing the rain domain FIG. Using such information, targeted deletions or bursts of these genes Focus on specific features of cell function affected by mutation Wear. Interest in the medically important features of Haemophilus influenzae biology is It is particularly focused on genes that determine the traits of the disease. Characteristics of bacteria and catalase genes The signature is determined and sequenced as a potential virulence-related gene. (Bishai et al., J. Bacteriol.176: 2914 (1994)). The cause of capsular polysaccharides A number of genes have been mapped and sequenced (Kroll et al., Mol. Microb. iol.Five(6): 1549 (1991)). Several outer membrane protein genes have been identified and (Langford et al., J. Gen. Microbiol.138: 155 (1992)) . The genes for the lipooligosaccharide component of the outer membrane and its synthetic pathway have been intensively studied. (Weiser et al., J. Bacteriol.173: 3304 (1990). Vaccines are available However, studies of outer membrane components have been somewhat motivated by the need to improve vaccines. ing. Data availability The Haemophilus influenzae genome sequence is available under the accession number L42023 in the Genome Sequence Database ( Genome Sequence Data Base (GSDB). Identified open Nucleotide sequence and peptide of each predicted coding region with start and stop codons Translations have also been accepted by the GSDB. Production of antibodies against Haemophilus influenzae proteins A substantially pure protein or polypeptide is transfected Or from transformed cells using any one of the methods known in the art. Isolated. Proteins can also be produced in recombinant prokaryotic expression systems, such as E. coli. Or can be chemically synthesized. Tan in final preparation The concentration of the protein is, for example, concentrated by an Amicon filter device. Adjusted to a level of a few micrograms / ml. Next, for this protein Monoclonal or polyclonal antibodies can be prepared as follows. RU: Production of monoclonal antibodies by hybridoma fusion Any epitope of the peptide identified and isolated as described Monoclonal antibodies are available from Kohler, G. and Millstein. Milstein C. (Naturetwenty five: 495 (1975)) or the classical method Can be prepared from a murine hybridoma. Simply saying For example, mice can be used for a few weeks for a few micrograms of the selected protein. Inoculate repeatedly. The mice are then sacrificed and spleen antibody-producing cells are isolated. Let go. Splenocytes are fused with mouse myeloma cells by polyethylene glycol And excess non-fused cells were isolated on selective medium containing aminobuterin (H The system is grown and destroyed in AT medium). Dilute the successfully fused cells and dilute The another is placed in a microtiter well where the growth of the culture is continued. Antibody producing claw Engval, E. (Meth. Enzymol.70: 419 (1980 Immunoassays such as the Eliza method and its modification, first described by The antibody in the supernatant of the well is detected and identified by the method. Selected positive claw To collect and use those monoclonal antibody products be able to. Detailed methods for producing monoclonal antibodies are provided by Davids L. L.) etc. (Basic Methods in Molecular Biology Elsevier, New York, Chapter 21-2 (1989). Polyclonal antibody production by immunoassay A polyclonal antiserum containing a heterologous epitope of the mono-protein An expressed protein as described above, which can be absent or modified to increase immunogenicity By immunizing an appropriate animal. Effective poly Clonal antibody production is affected by a number of factors associated with both plateaus and host species. box office. For example, small molecules tend to be less immunogenic than other molecules, and It may require the use of carriers or adjuvants. In addition, the host animal is Fluctuating in response to dose and dose, and inappropriate or excessive antigen can get. Small amounts (ng level) of antigen administered to multiple intradermal sites are the most reliable Seems to be. Efficient immunization protocol for rabbits Vaitukaitis, J. et al. (J. Clin. Metab.33: 988〜991 (1978 ) Can be seen. Give booster injections at regular intervals and semi-quantitative methods, e.g., of known concentration Antiserum antibody titer as determined by double immunodiffusion in agar against antigen When serum begins to fall, antiserum can be collected. For example, Ouchitaroni -Ouchterlony, O, etc. (Handbook of Experimental Immunology, See Wier D, Editing, Chapter 19 in Blackwell (1973). Plateau concentration of antibody is usually 0 It is in the range of .1 to 0.2 mg / ml serum (about 12 μM). Antiserum affinity for antigen Are, for example, Fisher D. (Manual of Clinical Immn) ology, 2nd edition, edited by Rose and Friedman, Amer. Soc. For Microbiology, Washington DC Competition binding curves as described in (42) in (1980) Create and measure. Antibody preparations prepared according to either protocol Useful in quantitative immunoassays that measure the concentration of substances with antigen in a sample They can also be used semi-quantitatively or to identify the presence of an antigen in a biological sample. Also used qualitatively. Preparation of PCR primers and amplification of DNA Various plasmids of the Rd genome of Haemophilus influenzae as disclosed in Tables 1 (a) and 2 According to the present invention to create PCR primers for a variety of applications. Can be used. The PCR primer is preferably at least 15 bases, and And more preferably at least 18 bases in length. Primer sequence When primers are selected, the primer pairs should have approximately the same G so that the melting points are approximately the same. It preferably has a / C ratio. PCR primers and amplified DN of this example A has uses in the following examples. Gene expression from DNA sequence corresponding to ORF The fragments of the R. flu genome provided in Table 1 (a) or 2 are: It is introduced into an expression vector using conventional techniques. (The cloned sequence is Expression vectors directing protein translation in mammalian, yeast, insect or bacterial expression systems Transfer techniques are well known in the art. ) Commercially available vectors and Expression systems include Stratagene (La Jolla, CA), Promega (Madison, WI) and Invitrogen (Inv) from a variety of suppliers, including itrogen (San Diego, CA) be able to. If desired, enhance expression and fold the appropriate protein To facilitate this, US Pat. No. 5,082,767 to Hatfield et al. This is incorporated herein by reference) and the sequence Codon association and codon pairing can be optimized for a particular expression organism. The following is the cloned ORF of the genomic fragment of Haemophilus It is provided as one exemplary method of generating the peptide (s). This O Since RF is of bacterial origin and lacks the polyA sequence, this sequence may be, for example, The polyA sequence was added and the BglI and SalI restriction endonuclease enzymes were used. Vector pXT1 (Stratagene) for use in eukaryotic expression systems And spliced from pSG5 (Stratagene) to construct Wear. pTX1 is the gag gene obtained from LTR and Moroni-mouse leukemia virus 1 part of Stable transfection depending on the position of the LTR in the construct Is possible. The vector contains the herpes simplex thymidine kinase promoter and And a selective neomycin gene. Haemophilus DNA is Bacteria using oligonucleotide primers that are complementary to DNA of the genus PstI obtained by PCR from the vector and introduced into the 5 ′ primer For restriction and BglII at the 5 'end of the corresponding Haemophilus DNA 3' primer It has an endonuclease sequence and ensures that the DNA of Haemophilus Care has been taken to follow with the array. Obtained from the resulting PCR reaction The purified fragment was digested with PstI and blunt-ended with exonuclease. And digested with BglII, purified and ligated with pTX1, now polyA and digested. Contains BglII. The bound product is lipofectin (Lipofin) under the conditions outlined in the product specification. ectin) (Life Technologies, Inc., Grand Island, NY) To transfect into mouse NIH 3T3 cells. Positive tiger The transfection product was obtained by transfecting the cells with 600 μg / ml of G418. (Sigma, St. Louis, MO). Protein The quality is preferably released into the supernatant. However, proteins are not membrane-bound Proteins are retained in the cell or Is restricted to cell surface expression. As the transfection product must be purified and Mice were injected with a synthetic 15-mer peptide synthesized from the Haemophilus DNA sequence. Antibodies against polypeptides encoded by Haemophilus genus DNA Generate. If antibody production is not possible, add the Haemophilus DNA sequence into a eukaryotic expression vector. In addition, they are introduced and expressed, for example, as chimeras with β-globin. β-g Antibodies to robin are used to purify the chimera. Next, β-globin Using the corresponding protease cleavage site processed between the gene and Haemophilus DNA After translation, the two polypeptide fragments are separated from each other. β-glo One useful expression vector for generating bin chimerics is pSG5 (Strat agene). This vector encodes rabbit β-globin. Rabbit β -Globin gene intron II promotes splicing of expressed transcripts And the polyadenylation signal introduced into this construct enhances expression values. These techniques described are well known to those skilled in the field of molecular biology. Standard one The law has been published in method textbooks such as Davis (previously described) and others. And Stratagene, Life Technologies, Inc. (Life Technologies, Inc.) ) Or obtain a number of methods from Promega technical support representatives it can. The polypeptide can be purchased from the in vitro express translation kit. (Express^TM In vitro translation such as Translation Kit) (Stratagene) The system can be used to additionally produce from either construct. Although the present invention has been described in some detail for purposes of clarity and understanding, Various changes in form and details may depart from the true scope of the invention. You will understand what can be done. All patents, patent applications and publications mentioned above are hereby incorporated by reference. Include in the description. Fatty acid / phospholipid metabolism Acetyl coenzyme A acetyltransferase (thiolase) (fadA) {cross Tridium acetobutylicum} FadR protein (fadR) involved in fatty acid metabolism {E. coli} (3R) -Hydroxymyristol acyl carrier protein dehydrase (fabZ) {large Enterobacteria} 3-ketoacyl-acyl carrier protein reductase (fabG) {E. coli} Acetyl-CoA carboxylase (accA) {E. coli} Acyl carrier protein (acpP) {E. coli} Acyl-CoA thioesterase II (tesB) {E. coli} Beta-ketoacyl-ACP synthase (fabB) {E. coli} Beta-ketoacyl-acyl carrier protein synthase III (fabH) {E. coli} Biotin carboxyl carrier protein (accB) {E. coli} Biotin carboxylase (accC) {E. coli} D-3-hydroxydecanoyl- (acyl carrier protein) dehydratase (fabA) {E. coli} Diacylglycerol kinase (dgkA) {E. coli} Long chain fatty acid coA ligase {Homo sapiens} Malonyl coenzyme A-acyl carrier protein transacylase (fabD enzyme {E. coli } Short-chain alcohol dehydrogenase homolog (envM) {E. coli} USG-1 protein (usg) {E. coli} 1-acyl-glycerol-3-phosphate acyltransferase (plsC) { E. coli} CDP-diglyceride synthetase (cdsA) {E. coli} Glyceres-3-phosphate acyltransferase (plsB) {E. coli} Phosphatidylglycerophosphate phosphatase B (pgpB) {E. coli} Phosphatidylglycerophosphate synthase (pgsA) {E. coli} Phosphatidylserine decarboxylase proenzyme (psd) {E. coli} Phosphatidylserine synthase (pssA) {E. coli} Protein D (hpd) {Hemophilus influenzae} Purines, pyrimidines, nucleosides and nucleotides Purine ribonucleic acid Leotide biosynthesis 5'-phosphoribosyl-5-amino-4-imidazole carboxylase II (pu rK) {E. coli} 5'-phosphoribosyl-5-aminoimidazole synthetase (purM) {large intestine Fungus} 5 'guanylate kinase (gmk) {E. coli} Adenylate kinase (ATP-AMP transphosphorylase) (adk) {Hemofu Virus influenza} Adenylosuccinate lyase (purB) {E. coli} Adenylosuccinate synthetase (purA) {E. coli} Amidophosphoribosyltransferase (purF) {E. coli} Formylglycinamide ribonucleotide synthetase (purL) {E. coli} Formyltetrahydrofolate hydrolase (purU) {E. coli} guaA protein (guaA) {E. coli} Inosine-5'-monophosphate dehydrogenase (guaB) {Acinetobacter cal Coacechicus} Nucleoside diphosphate kinase (ndk) {E. coli} Phosphoribosylamine-glycine ligase (purD) {E. coli} Phosphoribosylaminoimidazole carboxylase catalytic subunit (pur E) {Hemophilus influenza} Phosphoribosylaminoimidazolecarboxamide formyltransferase (purH) {E. coli} Phosphoribosylglycinamide formyltransferase (purN) {E. coli} Phosphoribosyl pyrophosphate synthetase (prsA) {S. typhimurium} SAICAR synthetase (purC) {B. pneumoniae} Pyrimidine ribonucleotide biosynthesis Dihydroorotate dehydrogenase (dihydroorotate oxidase) (pyrD) {E. coli} Orotate phosphoribosyltransferase (pyrE) {E. coli} orotidine 5'-monophosphate (OMP) decarboxyla with pyrF operon coding -Sease {E. Coli} pyrF protein (pyrF) {E. coli} Uracil phosphoribosyltransferase (pyrR) {Bacillus caldrich Scum} 2'-deoxyribonucleotide metabolism Anaerobic ribonucleoside-triphosphate reductase (nrdD) {E. coli} Deoxycytidine triphosphate deaminase (dod) {E. coli} Deoxyuridine triphosphatase (dut) {E. coli} Glutaredoxin (grx) {E. coli} nrdB protein (nrdB) {E. coli} Ribonucleoside-diphosphate reductase 1 alpha chain (nrdA) {E. coli} Thioredoxin reductase (trxB) {E. coli} Thymidylate synthetase (thyA) {E. coli} Reuse of nucleosides and nucleotides 2 ', 3'-cyclic-nucleotide 2'-phosphodiesterase (cpdB) {E. coli} Adenine phosphoribosyltransferase (apt) {E. coli} Adenosine-tetraphosphatase (apaH) {E. coli} Cytidine deaminase (cytidine aminohydrolase) (cda) {E. coli} Cytidylate kinase (cmk) {E. coli} Cytidylate kinase (cmk) {E. coli} Purine-nucleoside phosphorylase (deoD) {E. coli} Thymidine kinase (tdk) {E. coli} Uracil phosphoribosyltransferase (upp) {E. coli} Uridine phosphorylase (udp) {E. coli} Xanthine guanine phosphoribosyltransferase gpt (xgprt) {large Enterobacteria} Xanthine-guanine phosphoribosyltransferase (xgprt) {rat Salmonella typhi} Estimated ATPase (mrp) {E. coli} Sugar-nucleotide biosynthesis, conversion 5'-nucleotidase (ushA) {Homo sapiens} CMP-NeuNAc synthetase (siaB) {meningococcus} Galactose-1-phosphate uridyltransferase (galT) {Hemofil Influenza} Glucose phosphate uridyltransferase (galU) {E. coli} udp-glucose 4-epimerase (galactowardenase) (galE) {Hemo Fils Influenza} UDP-N-acetylglucosamine pyrophospholyase (glmU) {E. coli} Nucleotide and nucleoside interconversion Deoxyguanosine triphosphate triphosphohydrolase (dgt) {E. coli} Uridine kinase (uridine monophosphokinase) (udk) {E. coli} Adjustment function Adenylate cyclase (cyaA) {Hemophilus influenzae} Anaerobic respiratory control protein ARCA (DYE resistance protein) (arcA) {E. coli} Anaerobic respiratory control sensory protein (arcB) {E. coli} araC-like transcriptional regulator {Streptomyces lipidans} Arginine repressor protein (argR) {E. coli} arsC protein (arsC) {plasmid R773} ATP-dependent proteinase (lon) {E. coli} ATP: GTP 3'-pyrophosphotransferase (relA) {E. coli} Carbon starved protein (cstA) {E. coli} Carbon storage regulator (csrA) {E. coli} Cyclic AMP receptor protein (crp) {Hemophilus influenzae} Cyclic AMP receptor protein (crp) {Hemophilus influenzae} cys regulon transcriptional activator (cysB) {E. coli} Ferric absorption regulation protein (fur) {E. coli} Ciliary transcriptional regulator (pilB) {gonococci} Ciliary transcriptional regulator (pilB) {gonococci} Horyl polyglutamic acid-hydidrofolate synthetase expression regulator (accD) { E. coli} Fumaric acid (and nitrate) reduction regulatory protein (fnr) {E. coli} Galactose operon inhibitory factor (galS) {Hemophilus influenzae} Glucokinase regulator {Lattus norpegicus} Glycerol-3-Luglone phosphate inhibitor (glpR) {E. coli} Glycerol-3-Luglone phosphate inhibitor (glpR) {E. coli} Glycine cleavage system transcription activator (gcvA) {E. coli} GTP-binding protein (era) {E. coli} GTP-binding protein (obg) {Bacillus subtilis} Hydrogen peroxide-inducible activator (oxyR) {E. coli} L-fucose operon activator (fucR) {E. coli} lacZ expression regulator (icc) {E. coli} Leucine-responsive regulatory protein (lrp) {E. coli} Leucine-responsive regulatory protein (lrp) {E. coli} LEXA inhibitor (lexA) {E. coli} Lipooligosaccharide protein (lex2A) {Hemophilus influenzae} Lipooligosaccharide protein (lex2A) {Hemophilus influenzae} metF apo repressor (metJ) {E. coli} Molybdenum transport system alternate nitrogenase regulator (modD) {Rhodobacter cap Slatas} msbB protein (msbB) {E. coli} msbB protein (msbB) {E. coli} Negative regulator of translation (relB) {E. coli} Negative rpo regulator (mclA) {E. coli} Nitrate sensing factor protein (narQ) {E. coli} Nitrate / nitrite response regulator protein (narP) {E. coli} Nitrogen regulatory protein P-II (glnB) {E. coli} Guanosine pentaphosphate-3'-pyrophosphohydrolase (spoT) {E. coli} Regron phosphate sensing protein (phoR) {E. coli} Regron phosphate transcription regulatory protein (phoB) {E. coli} Putative nadAB transcription regulator (nadR) {E. coli} Purine nucleotide synthesis inhibitor protein (purR) {E. coli} Putative mulein gene regulator (bolA) {E. coli} rbs repressor (rbsR) {E. coli} Regulatory protein (asnC) {E. coli} Regulatory protein sfsl (sfsA) involved in maltose metabolism {E. coli} Suppressor for cytochrome P450 (Bm3R1) Bacteria} RNA polymerase sigma-32 factor (heat shock regulatory protein F334) (rpoH) { E. coli} RNA polymerase sigma-70 factor (rpoD) {E. coli} RNA polymerase sigma-E factor (rpoE) {E. coli} Sensing factor protein for basR (basS) {E. coli} Stringent starvation protein (sspB) {E. coli} Stringent starvation protein A (sspA) {Hemophilus influenzae} trans-activator for metE and metH (m8tR) {Hemophilus Influenza} Transcriptional activator (tenA) {Bacillus subtilis} Transcription activator protein (ilvY) {E. coli} Transcription regulatory protein (basR) {E. coli} Transcription regulatory protein (tyrR) {E. coli} Tryptophan inhibitor (trpR) {Enterobacter aerogenes} uxu operon regulator (uxuR) {E. coli} Xylose operon regulatory protein (xylR) {E. coli} Duplication DNA = replication, restriction / modification, recombination A / G-specific adening lycosylase (mutY) {E. coli} Chromosome replication initiation factor protein (dnaA) {E. coli} Chromosome replication initiation factor protein (dnaA) {E. coli} Cross-linked endodeoxyribonuclease (ruvC) {E. coli} dfp protein (dfp) {E. coli} DNA adenine methylase (dam) {E. coli} DNA gyrase, subunit A (gyrA) {E. coli} DNA gyrase, subunit B (gyrB) {E. coli} DNA helicase II (urvD) {Hemophilus influenzae} DNA ligase (lig) {E. coli} DNA unsuitable protein (mutH) {E. coli} DNA mismatch repair protein (mutS) {E. coli} DNA mismatch repair protein MUTL (mutL) {E. coli} DNA polymerase I (polA) {E. coli} DNA polymerase III beta subunit (dnaN) {E. coli} DNA polymerase III delta prime subunit (holB) {E. coli} DNA polymerase III delta subunit (holA) {E. coli} DNA polymerase III epsilon subunit (dnaQ) {E. coli} DNA polymerase III, alpha chain (dnaE) {E. coli} DNA polymerase III, chi subunit (holC) {Hemophilus inf Luense} DNA polymerase III, psi subunit (holD) {E. coli} DNA primase (dnaG) {E. coli} DNA recombinase (recG) {E. coli} DNA repair protein (recN) {E. coli} DNA topoisomerase I (topA) {Bacillus subtilis} DNA-3-methyladening ricosidase I (tagl) {E. coli} DNA-dependent ATPase, DNA helicase (recQ) {E. coli} dod protein (dod) {Selaysia marcescens} Dose-dependent dnaK inhibitor protein (dksA) {E. coli} Formamidopyrimidine-DNA glycosylase (fpg) {E. coli} Glucose-suppressed split protein (gidA) {E. coli} Glucose-suppressed split protein (gidB) {E. coli} Hin recombination promoting factor binding protein (fis) {E. coli} Hincll endonuclease (Hincll) {Hemophilus influen Ze} Hindll modified methyltransferase (hindllM) {Hemophil Influenza} Hindll restriction endonuclease (hindllR) {Hemophilus i Influenza} Holiday binding DNA helicase (rubA) {E. coli} Holiday binding DNA helicase (ruvB) {E. coli} Integrase / recombinase protein (xerC) {E. coli} Integrating host factor alpha-subunit (himA) {E. coli} Integrative host factor beta subunit (IHF-beta) (himD) {E. coli} Methylated-DNA-protein-cysteine methyltransferase (da tl) {Bacillus subtilis} mioC protein (mloC) {E. coli} Modified methylase HgiDl (MHgiDl) {Herpetosiphon Aulanchiac { Modified methylase Hincl (hinclM) {Hemophilus influenzae} Mutagen factor mutT (AT-GC mutation) {E. coli} Negative regulator of initiation of replication (seqA) {E. coli} Primosomal protein n precursor (priB) {E. coli} Primosomal protein replication factor (priA) {E. coli} Putative ATP-dependent helicase (dinG) {E. coli} recF protein (recF) {E. coli} recO protein (recO) {E. coli} Recombinase (recA) {Hemophilus influenzae} Recombinant protein (rec2) {Hemophilus influenzae} recR protein (recR) {E. coli} Regulatory protein (recX) {Pseudomonas fluorescens} rep helicase (rep) {E. coli} Replication protein (dnaX) {E. coli} Replicating DNA helicase (dnaB) {E. coli} Restriction enzyme (hgiDIR) {Herpetosiphon giganteus} S-adenosylmethionine synthetase 2 (metX) {E. coli} Shaflon-specific DNA recombinase (rci) {E. coli} Single-stranded DNA binding protein (ssb) {Hemophilus influenzae} Site-specific recombinase (rcb) {E. coli} Topoisomerase I (topA) {E. coli} Topoisomerase III (topB) {E. coli} Topoisomerase IV subunit A (parC) {E. coli} Topoisomerase IV subunit B (parE) {E. coli} Transcription-repair coupling factor (trcF) (mfd) {E. coli} Type I restriction enzyme ecokl specific protein (hsdS) {E. coli} Type I restriction enzyme ECOR124 / 3 1M protein (hsdM) {E. coli} Type I restriction enzyme EcoRI24 / 3 1M protein (hsdM) {E. coli} Type I restriction enzyme ECOR124 / 3 R protein (hsdR) {E. coli} Type III restriction-modified ECOP15 enzyme (mod) {E. coli} Uracil DNA glycosylase (ung) {E. coli} xprB protein (xerD) {E. coli} DNA degradation Endonuclease III (nth) {E. coli} Exinuclease ABC subunit A (urvA) {E. coli} Exinuclease ABC subunit B (urvB) {E. coli} Exonuclease ABC subunit C (urvC) {E. coli} Exodeoxyribonuclease I (sboB) {E. coli} Exodeoxyribonuclease V (recB) {E. coli} Exodeoxyribonuclease V (recC) {E. coli} Exodeoxyribonuclease V (recD) {E. coli} Exonuclease III (xthA) {E. coli} Exonuclease VII, large subunit (xseA) {E. coli} Single-strand-DNA-specific exonuclease (reoJ) {E. coli} Transcription RNA synthesis, modification and DNA transcription ATP-dependent helicase HEPA (hepA) {E. coli} ATP-dependent RNA helicase (srmB) {E. coli} ATP-dependent RNA helicase DEAD (deaD) {E. coli} DNA-dependent RNA polymerase alpha chain (rpoA) {E. coli} DNA-dependent RNA polymerase beta chain (rpoB) {S. typhimurium} DNA-dependent RNA polymerase beta chain (rpoC) {E. coli} N utilization substance protein B (nusB) {E. coli} Plasmid Coby number control protein (pcnB) {E. coli} Polynucleotide phosphorylase (pnp) {E. coli} Putative ATP-dependent RNA helicase (rhlB) {E. coli} RNA polymerase omega subunit (rpoZ) {E. coli} Sigma factor (algU) {Pseudomonas aeruginosa} Transcription anti-terminator protein (nusG) {E. coli} Transcription elongation factor (greB) {E. coli} Transcription factor (nusA) {S. typhimurium} Transcription termination factor rho (rho) {E. coli} RNA degradation Anticodon nuclease blocking agent (prrD) {E. coli} Exoribonuclease II (RNasell) {E. coli} Ribonuclease D (md) {E. coli} Ribonuclease E (me) {E. Coli} Ribonuclease H (mh) {E. coli} Ribonuclease HII (EC31264) (RNASE HII) {E. coli} Ribonuclease III (rnc) {E. coli} Ribonuclease PH (rph) {E. coli} RNaseP (mpA) {E. Coli} RNaseT (mt) {E. Coli} translation Ribosomal protein synthesis and modification Ribosomal protein L1 (rpL1) {E. coli} Ribosomal protein L10 (rpL10) {S. typhimurium} Ribosomal protein L11 (rpL11) {E. coli} Ribosomal protein L11 methyltransferase (prmA) {E. coli} Ribosomal protein L13 (rpL13) {Hemophilus influenzae} Ribosomal protein L14 (rpL14) {E. coli} Ribosomal protein L15 (rpL15) {E. coli} Ribosomal protein L16 (rpL16) {E. coli} Ribosomal protein L17 (rplQ) {E. coli} Ribosomal protein L18 (rpL18) {E. coli} Ribosomal protein L19 (rpL19) {E. coli} Ribosomal protein L2 (rpL2) {E. coli} Ribosomal protein L20 (rpL20) {E. coli} Ribosomal protein L21 (rpL21) {E. coli} Ribosomal protein L22 (rpL22) {E. coli} Ribosomal protein L23 (rpL23) {E. coli} Ribosomal protein L24 (rpL24) {E. coli} Ribosomal protein L25 (rpL25) {E. coli} Ribosomal protein L27 (rpL27) {E. coli} Ribosomal protein L28 (rpL28) {E. coli} Ribosomal protein L29 (rpL29) {E. coli} Ribosomal protein L3 (rpL3) {E. coli} Ribosomal protein L30 (rpL30) {E. coli} Ribosomal protein L31 (rpL31) {E. coli} Ribosomal protein L32 (rpL32) {E. coli} Ribosomal protein L33 (rpL33) {E. coli} Ribosomal protein L34 (rpL34) {E. coli} Ribosomal protein L35 (rpL35) {E. coli} Ribosomal protein L4 (rpL4) {E. coli} Ribosomal protein L5 (rpL5) {E. coli} Ribosomal protein L6 (rpL6) {E. coli} Ribosomal protein L7 / L12 (rpL7 / L12) {E. coli} Ribosomal protein L9 (rpL9) {E. coli} Ribosomal protein S1 (rpS1) {E. coli} Ribosomal protein S10 (rpS10) {E. coli} Ribosomal protein S11 (rpS11) {E. coli} Ribosomal protein S13 (rpS13) {E. coli} Ribosomal protein S14 (rpS14) {E. coli} Ribosomal protein S15 (rpS15) {E. coli} Ribosomal protein S15 (rpS15) {E. coli} Ribosomal protein S16 (rpS16) {E. coli} Ribosomal protein S17 (rplQ) {E. coli} Ribosomal protein S18 (rpS18) {E. coli} Ribosomal protein S19 (rpS19) {E. coli} Ribosomal protein S2 (rpS2) {E. coli} Ribosomal protein S21 (rpS21) {E. coli} Ribosomal protein S3 (rpS3) {E. coli} Ribosomal protein S4 (rpS4) {E. coli} Ribosomal protein S5 (rpS5) {E. coli} Ribosomal protein S6 (rpS6) {E. coli} Ribosomal protein S6 modified protein (rimK) {E. coli} Ribosomal protein S7 (rpS7) {E. coli} Ribosomal protein S8 (rpS8) {E. coli} Ribosomal protein S9 (rpS9) {Hemophilus sonnas} Ribosome-protein-alanine acetyltransferase (riml) {E. coli} Streptomycin resistance protein (strA) {Hemophilus influenzae} Aminoacyl tRNA synthetases, tRNA modification Alanyl-tRNA synthetase (alaS) {E. coli} Arginyl-tRNA synthetase (argS) {E. coli} Asparaginyl-tRNA synthetase (asnS) {E. coli} Aspartyl-tRNA synthetase (aspS) {E. coli} cys-tRNA synthetase (cysS) {E. coli} Cysteinyl-tRNA (ser) selenium transferase (selA) {E. coli} Glutaminyl-tRNA synthetase (glnS) {E. coli} Glutamyl-tRNA synthetase (gltX) {E. coli} Glycyl-tRNA synthetase alpha chain (glyQ) {E. coli} Glycyl-tRNA synthetase beta chain (glyS) {E. coli} Histidine-tRNA synthetase (hisS) {E. coli} Isoleucyl-tRNA ligase (ileS) {E. coli} Leucyl-tKNA synthetase (leuS) {E. coli} Lysyl-tRNA synthetase (lysU) {E. coli} Lysyl-tRNA synthetase homolog (genX) {E. coli} Methionyl-tRNA formyltransferase (fmt) {E. coli} Methionyl-tRNA synthetase (metG) {E. coli} Peptidyl-tRNA hydrolase (pth) {E. coli} Phenylalanyl-tRNA synthetase beta subunit (pheS) {large Enterobacteria} Phenylalanyl-tRNA synthetase Peter subunit (pheT) {large Enterobacteria} Prolyl-tRNA synthetase (proS) {E. coli} Pseudouridylate synthetase I (hisT) {E. coli} Cuocin biosynthetic protein (queA) {E. coli} Selenium metabolic protein (selD) {E. coli} Seryl-tRNA synthetase (serS) {E. coli} Threonyl-tRNA synthetase (thrS) {E. coli} Transfer RNA-guanine transglycosylase (tgt) {E. coli} tRNA (guanine-N1) -methyltransferase (M1G-methyltrans Spherase) (trmD) {E. Coli} tRNA (uracil-5-)-methyltransferase (trmA) {E. coli} tRNA delta (2) -isoventenyl pyrophosphate transferase (trpX) {E. coli} tRNA nucleotidyltransferase (oca) {E. coli} tRNA-guanine-transglycosylase (tgt) {E. coli} Tryptophanyl-tRNA synthetase (trpS) {E. coli} Tyrosyl-tRNA synthetase (tyrS) {thiobacillus ferrooxidans} Valyl-tRNA synthetase (valS) {E. coli} Nuclear protein DNA binding protein (estimated) {Bacillus subtilis} DNA-binding protein (rdgB) {Erwinia cartobora} DNA-binding protein H-NS (hns) {E. coli} DNA-binding protein HU-ALPHA (NS2) (HU-2) {E. coli} Protein-translation and modification Disulfide oxidoreductase (por) {Hemophilus influenzae} DNA treated strand A (dprA) {E. coli} Elongation factor EF-TS (tsf) {E. coli} Elongation factor EF-Tu (duplication) (tufB) {E. coli} Elongation factor EF-Tu (duplication) (tufB) {E. coli} Elongation factor G (fusA) {E. coli} Elongation factor P (efp) {E. Coli} Glutamate-ammonia-ligase adenylyltransferase (glnE) {E. coli} Initiation factor 3 (infC) {E. coli} Initiation factor IF-1 (infA) {E. coli} Initiation factor IF-2 (infB) {E. coli} Maturation of the antibiotic MccB17 (pmbA) Methionine aminopeptidase (map) {E. coli} Oxide-reductase (dsbB) {E. coli} Peptide chain releasing factor 2 (prfB) {S. typhimurium} Peptide chain-releasing factor 3 (prfC) {E. coli} Peptidyl-prolyl cis-trans isomerase B (ppiB) {colon grip} Polypeptide chain releasing factor 1 (prfA) {murine typhimurium} Polypeptide deformylase (formylmethionine deformylase) (def) {large Enterobacteria} Ribosome releasing factor (frr) {E. coli} Rotamase, peptidylprolyl cis-trans isomerase (slyD) {large intestine Fungus} Rotamase, peptidylprolyl cis-trans isomerase (slyD) {large intestine Fungus} Transcription elongation factor (greA) {E. coli} Translation factor (selB) {E. coli} xprA protein (xprA) {E. coli} Degradation of proteins, peptides and glycopeptides Aminopeptidase A (pepA) {Richetsia plowatsekii} Aminopeptidase a / i (pepA) {E. coli} Aminopeptidase N (pepN) {E. coli} Aminopeptidase P (pepP) {E. coli} ATP-dependent protease proteolytic component (clpP) {E. coli} ATP-dependent protease ATPase subunit (clpX) {E. coli} ATP-dependent protease binding subunit (clpB) {E. coli} Collagenase activity collagenase (prtC) {porphyromonas gingivalis} HFLC protein (hflC) {E. coli} IgA1 protease (igal) {Hemophilus influenzae} IgA1 protease (igal) {Hemophilus influenzae} IgA1 protease (igal) {Hemophilus influenzae} lon protease (lon) {Bacillus brevis} Oligopeptidase A (priC) {E. coli} Peptidase D (pepD) {E. coli} Peptidase E (pepE) {E. coli} Peptidase T (pepT) {S. typhimurium} Periplasmic serine protease Do and heat shock protein (htrA) {E. coli} Putative ATP-dependent protease (sms) {E. coli} Proline dipeptidase (pepQ) {E. coli} Protease (prtH) {porphyromonas gingivalis} Protease IV (sppA) {E. coli} Phage-specific protease lambda cll inhibitor (hflK) {E. coli} Putative protease (sohB) {E. coli} Sialoglycoprotease (gcp) {Pasturella haemolytica} Transport / binding protein Amino acids, peptides, amines Arginine transport ATP-binding protein artP (artP) {E. coli} Arginine transporter permease protein (artM) {E. coli} Arginine transport system permease protein (artQ) {E. coli} Biopolymer transport protein (exbB) {Hemophilus influenzae} Biopolymer transport protein (exbD) {E. coli} Branched aa transport system II carrier protein (braB) {Pseudomonas aeruginosa} D-alanine permease (dagA) {Alteromonas haloblancchis} Dipeptide transport ATP-binding protein (dppD) {E. coli} Dipeptide transport ATP-binding protein (dppF) {E. coli} Dipeptide transport system permease protein (dppB) {E. coli} Dipeptide transport system permease protein (dppB) {E. coli} Dipeptide transport system permease protein (dppC) {E. coli} Glutamate permease (glts) {E. coli} Glutamine transporter permease protein (glnP) {E. coli} Glutamine-bound periplasmic protein (glnH) {E. coli} Leucine-specific transport protein (livG) {E. coli} Membrane-related component, LIV-II transport system (brnQ) {S. typhimurium} Glycopeptide binding protein (oppA) {E. coli} Glycopeptide binding protein (oppA) {E. coli} Glycopeptide transport ATP-binding protein (oppD) {S. typhimurium} Glycopeptide transport ATP-binding protein (oppF) {S. typhimurium} Glycopeptide transport system permease protein (oppC) C {S. typhimurium} Peptide-transporting periplasmic protein (sapA) {S. typhimurium} Peptide transport system ATP-binding protein (sapD) {S. typhimurium} Dipeptide transport system permease protein (dppC) {E. coli} Peptide transport system permease protein (sapB) {S. typhimurium} Periplasmic arginine-binding protein (artl) {Basturella haemolytica} Proton glutamate cotransport protein (gltp) {Bacillus cardotenax} Putrescine transport protein (potE) {E. coli} Serine transport factor (sdaC) {E. coli} Spermidine / putrescine transport ATP-binding protein (potA) {E. coli} Spermidine / putrescine permease protein (potB) {E. coli} Spermidine / putrescine permease protein (potC) {E. coli} Spermidine / putrescine permease protein (potD) {E. coli} Spermidine / putrescine permease protein (potD) {E. coli} Tryptophan-specific permease (mtr) {E. coli} Tyrosine-specific transport protein (tyrP) {E. coli} Tyrosine-specific transport protein (tyrP) {E. coli} Cation Pacterioferritin Komigratri-protein (bcp) {E. coli} Ferric enterobactin transporting ATP-binding protein (fepC) {E. coli} Ferric enterobactin transporting ATP-binding protein (fepC) {E. coli} Ferrichrome-iron receptor (fhuA) {E. coli} Ferritin-like protein (rsgA) {E. coli} Ferritin-like protein (rsgA) {E. coli} Iron (III) dicitrate transport ATP-binding protein FECE {E. coli} Iron (III) dicitrate transport system permease protein (fecD) {E. coli} Magnesium and pigeon transport protein (corA) {E. coli} Major ferric binding protein protein precursor (fbp) {gonococci} Mercury transport protein (merT) {Pseudomonas aeruginosa} Mercury scavenger protein (merP) {Pseudomonas fluorescens} Mercury scavenger protein (merP) {Pseudomonas fluorescens} Molybdate-bound periplasmic protein precursor (modB) {azotopacter-bi Nerangi} NA (+) / H (+) counter transport 1 (nhaA) {E. coli} NA + / H + counter-transport 1 (nhaB) {E. coli} NA + / H + Counter Transport 1 (nhaC) {Bacillus Filmas} Periplasm-binding-protein-dependent iron transport protein (sfuB) Lucessence} Periplasm-binding-protein-dependent iron transport protein (sfuC) Lucessence} Potassium efflux system (kefC) {E. coli} Potassium / copper transport ING ATPase A (copA) {Enterococcus f. Ekaris} Sodium / proline cotransport (proline permease) (putp) {E. coli} tonB protein (tonB) {Hemophilus influenzae} TRK-based potassium-absorbing protein (trkA) {E. coli} Carbohydrates, organic alcohols and acids 2-oxoglutarate / maleate transposable element (SODiT1) {Spinasia ole Lasea} D-galactose-linked periplasmic protein (mglB) {E. coli} D-xylose transport ATP-binding protein (xylG) {E. coli} D-xylose-bound periplasmic protein (rbsB) {E. coli} Enzyme 1 (PtS1) {S. typhimurium} Formic acid transport factor (formate pathway) {E. coli} Fructose-permease IIA / FPR component (fruB) {E. coli} Fructose-permease II BC component (fruA) {E. coli} Fucose operon protein (fucU) {E. coli} glpF protein (glpF) {E. coli} glpF protein (glpF) {E. coli} Gluconate permease (gntP) {Bacillus subtilis} Glucose phosphotransferase enzyme III-glc (crr) {E. coli} Glycerol-3-phosphatase transporter (glpT) {E. coli} High affinity ribose transport protein (rbsA) {E. coli} High affinity ribose transport protein (rbsC) {E. coli} High affinity ribose transport protein (rbsD) {E. coli} L-fucose permease (fucP) {E. coli} L-lactate permease (lctp) {E. coli} Lactam-utilizing protein (lamB) {Emericella nidulans} mglA protein (mglA) {E. coli} mglC protein (mglC) {E. coli} Periplasmic ribose-binding protein (rbsB) {E. coli} Phosphohistidinoprotein-hexose phosphotransferase (ptsH) {large Enterobacteria} Potassium pathway homolog (kch) {E. coli} Putative aspartate transport protein (dcuA) {E. coli} Putative aspartate transport protein (dcuA) {E. coli} Ribose transport permease protein (xylH) {E. coli} Sodium- and chloride-dependent GABA transport {Homo saviens} Sodium-dependent noradrenaline transport {Homo saviens} Nucleosides, purines and pyrimidines Ribonucleotide transporting ATP-binding protein (mkl) {Lepromycetes} Uracil permease (uraA) {E. coli} Anion Cysteine synthetase (cysZ) {E. coli} Hydrophilic membrane-bound protein (modC) {E. coli} Hydrophobic membrane-bound protein (modB) {E. coli} Integrated membrane protein (pstA) {E. coli} Nitrate transport ATPase component (nasD) {Klebsiella pneumoniae} Peripheral membrane protein B (pstB) {E. coli} Peripheral membrane protein c (pstC) {E. coli} Periplasmic phosphate-binding protein (pstS) {E. coli} Periplasmic phosphate-binding protein (pstS) {E. coli} Phosphate permease (YBR296C) {Saccharomyces cerevisiae} Other ATP-dependent transposable element homolog (msbA) {Hemophilus influenzae} ATP-binding protein (abc) {E. coli} Pustular fibrosis transmembrane conductance regulator {Boss Taurus} Heme-binding riboprotein (dppA) {Hemophilus influenzae} Heme-hemopexin-binding protein (hxA) {Hemophilus influenzae} Hemin permease (hemU) {Elnicia enterocolitica} High affinity choline transport protein (betT) {E. coli} Lactoferrin-binding protein (IbpA) {N. meningitidis} Na + / sulfate cotransport factor {Lattus norvegicus} Pantothenate permease (panF) {E. coli} Transferrin binding protein 1 precursor (tbp1) {N. meningitidis} Transferrin binding protein 1 precursor (tbp1) {N. meningitidis} Transferrin binding protein 1 precursor (tbp1) {N. meningitidis} Transferrin binding protein 2 precursor (tbp2) {N. meningitidis} Transferrin-binding protein (tfbA) {Actinobacillus pleurnew Monier} Transferrin-binding protein 1 (tbp1) {meningococcus} Transferrin-binding protein 1 (tbp2) {meningococci} Transport ATP-binding protein (cydD) {E. coli} Transport ATP-binding protein (cydD) {E. coli} Cell processing Chaperones Chaperonin (groES) (mopB) {E. coli} Heat shock protein (groEL) (mopA) {Hemophilus Juclay} Heat Striking Protein (dnaJ) {E. coli} Thermoprotein c62.5 (httpG) {E. coli} hsc66 protein (hsc66) {E. coli} hsp70 protein (dnaK) {E. coli} Cell division Cell division ATP-binding protein (ftsE) {E. coli} Cytostatic factor (sulA) {Vibrio chorele} Cell division protein (ftsA) {E. coli} Cell division protein (ftsH) {E. coli} Cell division protein (ftsH) {E. coli} Cell division protein (ftsJ) {E. coli} Cell division protein (ftsL) {E. coli} Cell division protein (ftsQ) {E. coli} Cell division protein (ftsW) {E. coli} Cell division protein (ftsY) {E. coli} Cell division protein (ftsZ) {E. coli} Cell division protein (mukB) {E. coli} Cytoplasmic filament protein (cafA) {E. coli} ftsX protein (ftsX) {E. coli} mukB inhibitor protein (smbA) {E. coli} Penicillin-binding protein 3 (ftsl) {E. coli} Protein and peptide secretion GTP-binding membrane protein (lepA) {E. coli} Colicin V secreted ATP-binding protein (cvaB) {E. coli} Riboprotein signal peptidase (lspA) {E. coli} Peptide transport system ATP-binding protein SAPF (sapF) {E. coli} Preprotein translocase (secE) {E. coli} Preprotein translocase SECY subunit (secY) {E. coli} Protein-transport membrane protein (secD) {E. coli} Protein-transport membrane protein (secF) {E. coli} Protein-transport membrane protein (secG) {E. coli} Protein-transport protein (secB) {E. coli} secA protein (secA) {E. coli} Signal peptidase I (lepB) {E. coli} Signal recognition particle protein (54 homologues) (ffh) {E. coli} Initiation factor (tig) {E. coli} Type 4 prepirin-like protein-specific leader peptidase (hopD) {E. coli} xopS protein (xcpS) {Pseudomonas buchida} Detoxification KW20 catalase (hktE) {Hemophilus influenzae} Superoxide dismutase (sodA) {Hemophilus influenzae} Thiophene and furan oxidized protein (thdF) {E. coli} Cell death Hemolysin (tlyc) {Serprina hyodysenteriae} Hemolysin, 21kDa (hly) {Actinobacillus prouroneumoniae} Dead protein (kicA) {E. coli} Killed protein inhibitor (kicB) {E. coli} Leukotoxin secreting ATP-binding protein (lktB) {Actinobacillus acti Nomiseten Committance} Conversion com101A protein (comF) {Hemophilus influenzae} Reaction locus E (comE1) {Bacillus subtilis} tfoX protein (tfoX) {Hemophilus influenzae} Converted gene group hypothetical protein (GB: M62809_1) (com) {Hemophilus i Influenza} Converted gene group hypothetical protein (GB: M62809_10) (com) {Hemophilus Influenza} Converted gene group hypothetical protein (GB: M62809_2) (com) {Hemophilus i Influenza} Converted gene group hypothetical protein (GB: M62809_3) (com) {Hemophilus i Influenza} Converted gene group hypothetical protein (GB: M62809_4) (com) {Hemophilus i Influenza} Converted gene group putative protein (GB: M62809_5) (com) {Hemophilus i Influenza} Transformed gene group hypothetical protein (GB: M62809_6) (com) {Hemophilus i Influenza} Converted gene group hypothetical protein (GB: M62809_7) (com) {Hemophilus i Influenza} Other classes Colicin-related functions Colicin resistance protein (tolB) {E. coli} Colicin V-producing protein (pur regulon) (cvpA) {E. coli} Inner membrane protein (tolQ) {E. coli} Inner membrane protein (tolR) {E. coli} Outer membrane integration protein (tolA) {E. coli} Outer membrane integration protein (tolA) {E. coli} Phage-related functions and bophages E16 protein (muE16) {bacteriophage mu} G protein (muG) {bacteriophage mu} G protein (muG) {bacteriophage mu} gam protein {bacteriophage mu} Heat shock protein B253 (grpE) {E. coli} Host factor-1 (HF-1) (hfq) {E. coli} I protein (mul) {bacteriophage mu} MuB protein (muB) {bacteriophage mu} N protein (muN) {bacteriophage mu} P protein {bacteriophage mu} Terminase subunit 1 {Bacteriophage SF6} Transposase A (muA) {Bacteriophage mu} Transposons-related functions Insertion sequence IS1016 (V-4) hypothetical protein (GB: X58176_2) {hemophyte Russ Influenza} IS1016-V6 protein (IS1016-V6) {Hemophilus influen Ze} IS1016-V6 protein (IS1016-V6) {Hemophilus influen Ze} IS1016-V6 protein (IS1016-V6) {Hemophilus influen Ze} Drug / homolog susceptibility Acriflavine resistance protein (acrB) {E. coli} Protein for ampD signal (ampD) {E. coli} Bicyclomycin resistance protein (bor) {E. coli} Mercury resistance regulatory protein (merR2) {thiobacillus ferrooxidans} Regulator of drug activity (mda66) {E. coli} Complex drug resistance protein (emrB) {E. coli} Complex drug resistance protein (emrA) {E. coli} Complex drug resistance protein (mdl) {E. coli} Nodule protein T (nodT) {Rhizobium reguminosarum} rRNA (adenosine-N6, N6-)-dimethyltransferase (ksgA) { E. coli} Telluric acid resistant protein (tehA) {E. coli} Telluric acid resistant protein (tehB) {E. coli} Radiation sensitivity radC protein (radC) {E. coli} Adaptive, atypical conditions In-growth protein (auto) {Alkalines eutrophus} Heat shock protein (httpX) {E. coli} Heat shock protein B (ibpB) {E. coli} htrA-like protein (htrH) {E. coli} Invasion protein (invA) {Bartonella basiliformis} NAD (P) H: menadione oxidoreductase {mus musculus} Survival protein (surA) {E. coli} uspA protein (uspA) {E. coli} Virulence plasmid protein (vagC) {Salmonella Dublin} Virulence-related protein A (vapA) {Dikeobacter nodosas} Virulence-related protein C (vapC) {Dikeobacter nodosas} Virulence-related protein C (vapC) {Dikeobacter nodosas} Virulence-related protein D (vapD) {Dikeobacter nodosas} Virulence plasmid protein (mlgA) {Shewanella colwariana} Unidentified 15kDa protein (P15) {E. coli} 2-hydroxy acid dehydrogenase homolog (ddh) {Dimomonas mobilis} Beta-lactamase regulatory homolog (mazG) {E. coli} Conjugation co-suppressor (finO) {E. coli} Delta-1-pyrroline-5-carboxylate reductase (proC) {Pseudomonas aeruginosa} devA protein (devA) {Anabaena species} devB protein (devB) {Anabena species} Embryo wealthy protein, group 3 {Triticum estivum} Extragenic repressor (suhB) {E. coli} GCPE protein (protein E) (gpcE) {E. coli} GerC2 protein (gerC2) {Bacillus subtilis} glpX protein (glpX) {E. coli} Glyoxylic acid-derived protein {E. coli} hslU protein (hslU) {E. coli} hslV protein (hslV) {E. coli} ilv-related protein {E. coli} Isochorismate synthase (entC) {Bacillus subtilis} Membrane-related ATRase (cbiO) {Propionibacterium leudenleyii} Membrane protein (lapB) {Basteurella haemolytica} Membrane protein (lapB) {Basteurella haemolytica} N-carbamyl-L-amino acid amide hydrolase {Bacillus stearothermo Philas} Nitrogen fixing protein (nifS) {Anabaena species} Nitrogen-fixing protein (nifS) {Mycobacterium leprea} Nitrogen-fixing protein (nifS) {Mycobacterium leprea} Nitrogen fixing protein (nifU) {Klebsiella pneumoniae} Nitrogen fixation protein (nnfE) {Lordobacter capsalatas} Nitrogen fixation protein (nnfE) {Lordobacter capsalatas} Nitrogenase C (nifC) {Clostilidium pasteurianum} Nitrogenase C (nifC) {Clostilidium pasteurianum} nmt1 protein (nmt1) {Aspergillus pasteurianum} Partitioning system protein (parB) {plasmid RP4} larD protein (larD) {E. coli} larD protein (larD) {E. coli} skp protein (skp) {pasteurella maltosida} Small protein (smpB) {E. coli} spollE protein (spillE) {Coxiella pumeti} Suppressor protein (msgA) {E. coli} Surfactin (sfpo) {Bacillus subtilis} toxR legron (tagD) {Vibrio chorele} traN protein (traN) {plasmid RP4} Transport ATP-binding protein (cydC) {E. coli} Transport ATP-binding protein (cydC) {E. coli} vanH protein (vanH) {transposon Tn1546} Mucus state locus protein (mucB) {Pseudomonas aeruginosa} Phenol hydroxylase (ORF6) {Acinetobacter calcoaceticus} Plasma protease C1 inhibitor {Homo sapiens} Known ATP-dependent rearrangement homolog (msbA) Outer membrane protein P2 (ompP2) Single-stranded DNA binding protein (ssb) tonB protein (tonB) Heme-hemopexin-binding protein (hxuA) Adenylate kinase (ATP-AMP transphosphorylase) (adk) Hypothetical protein (SP: P24326) udp-glucose 4-epimerase (galactowardenase) (galE) Hypothetical protein (SP: P24324) PC protein (15 kd peptidoglycan-related outer membrane riboprotein) (pal) Outer membrane protein P1 (ompP1) Converted gene group hypothetical protein (GB: M62809_7) (com) Converted gene group hypothetical protein (GB: M62809_6) (com) Converted gene group hypothetical protein (GB: M62809_5) (com) Converted gene group hypothetical protein (GB: M62809_4) (com) Converted gene group hypothetical protein (GB: M62809_3) (com) Converted gene group hypothetical protein (GB: M62809_2) (com) Converted gene group hypothetical protein (GB: M62809_1) (com) Hinoll endonuclease (Hincll) Modified methylase Hincl (hinclM) Lipooligosaccharide biosynthetic protein Streptomycin resistance protein (strA) Recombinase (recA) tfoX protein (tfoX) Adenylate cyclase (cyaA) 28 kDa membrane protein (hlpA) Protein D (hpd) Riboprotein (hel) Aldose 1-epimerase precursor (mutalotase) (mro) Galactokinase (galK) Galactose-1-phosphate uridylyltransferase (galT) Galactose operon inhibitor (galS) Hypothetical protein (GB: M94205_1) Disulfide oxidoreductase (por) Heme-binding riboprotein (dppA) Protective surface antigen D15 KW20 catalase (hktE) Cyclic AMP receptor protein (crp) Superoxide dismutase (sodA) Outer membrane protein P5 (ompA) DNA helicase II (uvrD) Hindll modified methyltransferase (hindllM) Hindll restriction endonuclease (hindllR) DNA polymerase III, chi subunit (holC) lic-1 operon protein (licC) lic-1 operon protein (licD) 15 kd peptidoglycan-related riboprotein (1 pp) Formyltetrahydrofolate hydrolase (purU) Enol pyruvyl shikimate phosphate synthase (aroA) lsg locus hypothetical protein (GB: M94855_8) lsg locus hypothetical protein (GB: M94855_7) Iss gene locus hypothetical protein (GB: M94855_6) Iss gene locus hypothetical protein (GB: M94855_5) lsg locus hypothetical protein (GB: M94855_4) lsg locus hypothetical protein (GB: M94855_3) lsg locus hypothetical protein (GB: M94855_2) lsg locus hypothetical protein (GB: M94855_1)

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＣ１２Ｒ 1:91）Ｃ０７Ｋ 16/08 (31)優先権主張番号０８／４８７，４２９ (32)優先日 1995年６月７日 (33)優先権主張国米国（ＵＳ） (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＺ，ＵＧ)，ＵＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＨＵ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＺ，ＶＮ (72)発明者フレイシュマン，ロバート・ディアメリカ合衆国20878メリーランド州ゲイザーズバーグ、チフリー・スクエア・ロード470番 (72)発明者アダムス，マーク・ディアメリカ合衆国20878メリーランド州ノース・ポトマック、ダフィーフ・ドライブ 15205番 (72)発明者ホワイト，オーウェンアメリカ合衆国20878メリーランド州ゲイザーズバーグ、クィンス・オーチャード・ブールバード886番アパートメント・ナンバー 202 (72)発明者スミス，ハミルトン・オーアメリカ合衆国21204メリーランド州タウソン、カーブリッジ・サークル8222番 (72)発明者ベンター，ジェイ・クレイグアメリカ合衆国20854メリーランド州ポトマック、グレン・ミル・ロード11915番──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification code FI C12R 1:91) C07K 16/08 (31) Priority claim number 08 / 487,429 (32) Priority date June 7, 1995 (33) Priority country United States (US) (81) Designated country EP (AT, BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT , SE), OA (BF, BJ, CF, CG, CI, CM, GA, GN, ML, MR, NE, SN, TD, TG), AP (KE, LS, MW, SD, SZ, UG) , UA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AL, AM, AT, AU, AZ, BB, BG, BR, BY, CA, CH, CN, CZ, DE, DK, E, ES, FI, GB, GE, HU, IS, JP, KE, KG, KP, KR, KZ, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX , NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, TJ, TM, TR, TT, UA, UG, UZ, VN (72) Inventor Freischmann, Robert Di Adams, Mark Di, United States 20878 Maryland, Gaithersburg, Gaithersburg, United States No. 470 (72) Inventor Daffef Drive, No. 15205 (72) Inventor White, Owen United States 20878 Queens Orchard Boulevard, Gaithersburg, MD Apartment No. 886 Apartment Number 202 (72) Inventor Smith, Ha Milton-Oh, U.S.A., Carbridge Circle 8222, Towson, Maryland 21204, United States Inventor Vent, Jay Craig United States 20854 Glen Mill Road, Potomac, Maryland 11985

Claims

[Claims] 1. SEQ ID NO: 1. Nucleotide sequence represented by 1, a representative fragment thereof Or at least 99.9% identical to the nucleotide sequence represented by SEQ ID NO: 1. A computer-readable medium recording a nucleotide sequence. 2. Except for the fragment of SEQ ID NO: 1 shown in Table 1b, No. 1 of the fragment of SEQ ID NO: 1 or a degenerate variant thereof is described. A computer-readable medium recorded. 3. The above media is floppy disk, hard disk, random access memo Group consisting of memory (ROM), read-only memory (ROM) and CD-ROM The computer-readable medium according to claim 1, wherein the medium is selected from the group consisting of: 4. The above media is floppy disk, hard disk, random access memo Group consisting of memory (ROM), read-only memory (ROM) and CD-ROM The computer-readable medium according to claim 3, which is selected from the group consisting of: 5. The following elements: a) The nucleotide sequence of SEQ ID NO: 1, a representative fragment or sequence thereof Column identification number: a nucleotide sequence at least 99.9% identical to the nucleotide sequence 1 Data storage means comprising: b) In order to identify the homologous sequence (s), the target sequence is Search means for comparing with the nucleotide sequence of data storage means, and c) Extractor for obtaining the homologous sequence (s) of step (b) above Dan, To identify commercially important fragments of the Haemophilus genome containing Computer-based system. 6. SEQ ID NO: 1 is a nucleotide sequence represented by At least 99.9% identical to the nucleotide sequence of fragment or SEQ ID NO: 1 A database containing the nucleotide sequence of Hemophile comprising obtaining a nucleic acid molecule comprising a nucleotide sequence complementary to the sequence A method of identifying a commercially important nucleic acid fragment of the R. genus, comprising the steps of: Sequences are not randomly selected. 7. SEQ ID NO: 1. Nucleotide sequence represented by 1, a representative fragment thereof Nucleotide at least 99.9% identical to the nucleotide sequence of SEQ ID NO: 1 The database containing the peptide sequence is compared with the target sequence and complemented with the target sequence. Of obtaining a nucleic acid molecule comprising a specific nucleotide sequence A method for identifying an expression control fragment of a gene, wherein the target sequence regulates gene expression. Contains sequences known to knot. 8. Isolated protein-encoding nucleic acid fragment of the Rd genome of Haemophilus influenzae Wherein the fragment is the fragment of SEQ ID NO: 1 shown in Table 1b. Except for any one of the fragments of SEQ ID NO: 1 shown in Table 1a It consists of a nucleotide sequence or a degenerate variant thereof. 9. Except for the fragment of SEQ ID NO: 1 shown in Table 1b, Any one of the fragments of the Rd genome or the degeneracy thereof A vector containing the variant. 10. An isolated fragment of the Rd genome of Haemophilus influenzae, wherein the fragment The fragment modulates the expression of an operably linked open reading frame and the fragment Is shown in Table 1a except for the fragment of SEQ ID NO: 1 shown in Table 1b. Length of about 10 to 200 bases in the 5 'of any one of the open reading frames It consists of a nucleotide sequence or a degenerate variant thereof. 11. 9. Any of the fragments of the Rd genome of Haemophilus influenzae according to claim 8. Or a vector containing one. 12. 9. Any one of the fragments of the genome of the genus Haemophilus according to claim 8. An organism that has been modified to contain 13. Any one of the fragments of the genome of the genus Haemophilus according to claim 10. An organism that has been modified to contain one. 14． 5 'of a nucleic acid molecule consisting of a nucleotide sequence of about 10 to 100 bases , Except for the fragment of SEQ ID NO: 1 shown in Table 1b. 5 of any one of the fragments of the genome of the genus Haemophilus or a degenerate variant thereof. '. The method for regulating the expression of the above nucleic acid, comprising the step of binding to 15. Except for the fragment of SEQ ID NO: 1 shown in Table 1b, Encodes a homologue of any one of the indicated fragments of the Haemophilus genome An isolated nucleic acid molecule comprising: a) replacing any one of the fragments of the Haemophilus genome shown in Table 1a Screening a genomic DNA library to use as a target sequence; b) The library containing a sequence that hybridizes with the target sequence Identifying members of c) isolating the nucleic acid molecule from the member identified in step (b); Manufactured by stages. 16. Except for the fragment of SEQ ID NO: 1 shown in Table 1b, Encodes a homologue of any one of the indicated fragments of the Haemophilus genome An isolated DNA molecule, wherein the nucleic acid molecule is: a) isolating mRNA, DNA or cDNA produced from the organism; b) the nucleotide sequence of the nucleic acid molecule is the flag of the Haemophilus genome Amplification primers that are derived from the primer and prime the amplification Amplify nucleic acid molecules homologous to c) isolating the sequence produced and amplified in step (b). Manufactured by stages. 17． Except for the fragment of SEQ ID NO: 1 shown in Table 1b, Any one of the indicated fragments of the Rd genome or the fragment An isolated polypeptide encoded by a degenerate variant of a fragment. 18. An isolated polynucleotide encoding any one of the polypeptides of claim 17. Nucleotide molecule. 19. An antibody that selectively binds to any one of the polypeptides of claim 17. 20. a) The nucleotide sequence of the heterologous nucleic acid molecule is the sequence identification number shown in Table 1b. No. 1 except for the fragment of Rd genome of Haemophilus influenzae shown in Table 1a. A heterologous nucleic acid molecule comprising any one of the fragments or a degenerate variant thereof. The containing host is subjected to conditions for expressing the heterologous nucleic acid molecule to produce a protein. Incubate with; b) isolating the protein; A method of producing a polypeptide in a host cell comprising a step.