JP4947573B2

JP4947573B2 - Microarray data analysis method and analyzer

Info

Publication number: JP4947573B2
Application number: JP2006211110A
Authority: JP
Inventors: 力古澤; 朋治縣; 直亮小野; 真吾鈴木; 明子柏木; 哲也四方; 浩清水; 邦彦金子
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2006-08-02
Filing date: 2006-08-02
Publication date: 2012-06-06
Anticipated expiration: 2026-08-02
Also published as: JP2008039475A

Description

本発明はマイクロアレイ実験により得られたデータの解析方法及び解析装置に関する。 The present invention relates to a method and an apparatus for analyzing data obtained by a microarray experiment.

Affymetrix社のGeneChip型をはじめとするマイクロアレイは、DNAチップとも呼ばれ、数千種のDNA/RNAの濃度を同時に、かつ定量的に計測することを可能とするものであり、ガラスやシリコン製の小基盤上にDNA分子（プローブ）を高密度に配置して形成される。 Affymetrix's GeneChip type microarrays, also called DNA chips, are capable of simultaneously and quantitatively measuring the concentration of thousands of DNA / RNA. It is formed by arranging DNA molecules (probes) at high density on a small substrate.

マイクロアレイ上のプローブと、解析対象とするターゲットとをハイブリダイゼーションさせ、各プローブから得られるシグナルを測定し、数値化して解析することで、ターゲットの濃度を知ることができる。従来より、マイクロアレイ実験におけるデータ解析技術として種々の装置及び方法が研究、開発されている（特許文献１〜３等参照）。 The concentration of the target can be known by hybridizing the probe on the microarray and the target to be analyzed, measuring the signal obtained from each probe, digitizing it, and analyzing it. Conventionally, various apparatuses and methods have been researched and developed as data analysis techniques in microarray experiments (see Patent Documents 1 to 3, etc.).

マイクロアレイを用いてトランスクリプトームのような非常に多種類のDNAの断片を測定する場合、プローブの配列に対応する相補的なターゲットではなく、その配列と部分的にマッチするさまざまなターゲットが結合（非特異的な結合、以下「クロスハイブリダイゼーション」という。）してしまうことがある。 When microarrays are used to measure very many types of DNA fragments, such as transcriptomes, various targets that partially match the sequence bind, not the complementary target corresponding to the probe sequence ( Non-specific binding, hereinafter referred to as “cross-hybridization”).

このクロスハイブリダイゼーションにより、目的のターゲットがプローブに結合した真のシグナル値に加算されるため、測定値においてノイズ成分として扱われる。特にサンプル中の目的のターゲットの濃度が低い場合には、真のターゲットに起因するシグナル成分と、クロスハイブリダイゼーションの影響によるシグナル成分とを区別することが困難であり、測定結果の信頼度を下げる大きな要因となっている。 This cross-hybridization adds the target target to the true signal value bound to the probe, so that it is treated as a noise component in the measurement value. In particular, when the concentration of the target of interest in the sample is low, it is difficult to distinguish between the signal component due to the true target and the signal component due to the effect of cross-hybridization, reducing the reliability of the measurement results. It is a big factor.

このような問題を解決する方法として、Affymetrix社の解析ソフトウェアに実装されている、ミスマッチプローブを用いたアルゴリズムがある。 As a method for solving such a problem, there is an algorithm using a mismatch probe implemented in analysis software of Affymetrix.

Affymetrix社のGeneChip(登録商標)マイクロアレイには、目的のターゲットと相補的な配列を持つプローブ（Perfect Match (PM)プローブ）に加えて、ＰＭプローブの配列と一塩基だけ異なる配列を持ったミスマッチ(MisMatch (MM))プローブが合成されている。このアルゴリズムでは、PMプローブとMMプローブの間ではクロスハイブリダイゼーションの効果にあまり差がないと見なし、PMプローブのシグナル値からMMプローブのシグナル値を減算することで、クロスハイブリダイゼーションの影響をキャンセルできるという考えに基いている。 The Affymetrix GeneChip (registered trademark) microarray has a mismatch that has a sequence that differs from the PM probe sequence by one base in addition to a probe that has a sequence complementary to the target of interest (Perfect Match (PM) probe). MisMatch (MM)) probe is synthesized. This algorithm assumes that there is not much difference in the effect of cross-hybridization between the PM probe and MM probe, and the effect of cross-hybridization can be canceled by subtracting the signal value of the MM probe from the signal value of the PM probe. Based on the idea.

特開２００４−０１６１３１号公報Japanese Patent Laid-Open No. 2004-016131 特開２００４−０１３５７３号公報JP 2004-013573 A 国際公開第ＷＯ２００２／００１４７７号明細書International Publication No. WO2002 / 001477 Specification

しかし、実際には、クロスハイブリダイゼーションの影響はPMプローブとMMプローブとでは大きく異なる場合が多いため、上記のアルゴリズムで求めた値は、必ずしもクロスハイブリダイゼーションの影響を除いたシグナルの値を適切に反映しているわけではない。その結果、その誤差が大きなノイズとして現れてしまい、データの信頼性を低下させる要因となっている。特に、濃度の低いターゲットは、測定されるシグナルがノイズに埋もれてしまうことから、従来、測定が困難であった。 However, in practice, the effects of cross-hybridization often differ greatly between PM probes and MM probes, so the values obtained by the above algorithm are not necessarily signal values that exclude the effects of cross-hybridization. It does not reflect. As a result, the error appears as large noise, which is a factor of reducing the reliability of data. In particular, a target having a low concentration has been conventionally difficult to measure because a signal to be measured is buried in noise.

本発明では、上記課題を解決すべくなされたものであり、その目的とするところは、マイクロアレイ実験におけるシグナル値からノイズの影響を除去することで遺伝子発現量の解析の精度を向上させる解析方法および解析装置を提供することにある。 The present invention has been made to solve the above-mentioned problems, and the object of the present invention is an analysis method for improving the accuracy of gene expression level analysis by removing the influence of noise from the signal value in a microarray experiment, and To provide an analysis device.

本発明の第１の態様において、複数のプローブが配置されたマイクロアレイを用いて溶液中に含まれるターゲットの濃度を求めるマイクロアレイの解析方法が提供される。 In a first aspect of the present invention, there is provided a microarray analysis method for obtaining a concentration of a target contained in a solution using a microarray in which a plurality of probes are arranged.

その解析方法においてマイクロアレイは、解析対象とするターゲットの塩基配列に対応した塩基配列を有する複数の計測プローブと、解析対象とするターゲットの塩基配列に対応しない塩基配列を有する複数のランダムプローブとを含む。 In the analysis method, the microarray includes a plurality of measurement probes having a base sequence corresponding to the base sequence of the target to be analyzed, and a plurality of random probes having a base sequence not corresponding to the base sequence of the target to be analyzed. .

マイクロアレイの解析方法は以下のステップを含む：
１）ターゲットを含む溶液とマイクロアレイの各プローブとをハイブリダイゼーションさせるステップ、
２）各プローブの蛍光強度を測定するステップと、ランダムプローブの蛍光強度の測定値から、プローブに非特異的に結合するターゲットであるバックグラウンドターゲットの量（ｔ^ns）を求めるステップ、
３）バックグラウンドターゲットの量と各プローブの結合エネルギーとに基づき、計測プローブの蛍光強度の測定値におけるクロスハイブリダイゼーションの影響度（NS_i）を予測するステップ、
４）計測プローブの蛍光強度の測定値（Ｉ^m _i）からクロスハイブリダイゼーションの影響度を除外して、解析対象とするターゲットの量（ｔ^t _i）を求めるステップ。 The microarray analysis method includes the following steps:
1) a step of hybridizing a solution containing the target and each probe of the microarray,
2) a step of measuring the fluorescence intensity of each probe, and a step of determining the amount (t ^ns ) of a background target that is a target non-specifically bound to the probe from the measurement value of the fluorescence intensity of the random probe;
3) predicting the degree of influence (NS _i ) of cross-hybridization on the measurement value of the fluorescence intensity of the measurement probe based on the amount of the background target and the binding energy of each probe;
4) A step of obtaining the amount (t ^t _i ) of the target to be analyzed by excluding the degree of influence of cross hybridization from the measured value (I ^m _i ) of the fluorescence intensity of the measurement probe.

上記マイクロアレイの解析方法において、プローブの取り得る状態間の平衡を示すモデルを定義し、そのモデルを用いてバックグラウンドターゲットの量（ｔ^ns）が求められてもよい。モデルにおけるプローブの状態には、プローブが溶液中でフリーである状態、プローブとバックグラウンドターゲットとが非特異的に結合した状態、及びプローブが自己ハイブリダイゼーションによって折り畳まれ二次構造をとっている状態が含まれてもよい。この場合、マイクロアレイの解析方法はさらに、解析対象のゲノムを断片化したサンプルのみをマイクロアレイ上のプローブとハイブリダイゼーションさせるステップと、ハイブリダイゼーションによるランダムプローブの蛍光強度を測定するステップと、その測定値に基づき、前記モデルの平衡定数を求めるためのパラメータ（ｗ^pf、ε^ns）を決定するステップとを含んでもよい。 In the microarray analysis method, a model indicating an equilibrium between possible states of the probe may be defined, and the amount (t ^ns ) of the background target may be obtained using the model. The state of the probe in the model includes a state where the probe is free in solution, a state where the probe and the background target are non-specifically bound, and a state where the probe is folded by self-hybridization and has a secondary structure. May be included. In this case, the microarray analysis method further includes a step of hybridizing only a sample obtained by fragmenting the genome to be analyzed with a probe on the microarray, a step of measuring the fluorescence intensity of the random probe by the hybridization, and a measurement value thereof. And determining parameters (w ^pf , ε ^ns ) for determining the equilibrium constant of the model.

または、プローブの状態には、さらに、ターゲットが溶液中でフリーである状態、ターゲットが自己ハイブリダイゼーションによって折り畳まれ二次構造をとっている状態、及びプローブとバックグラウンドターゲットとが特異的に結合した状態が含まれてもよい。このとき、結合エネルギーはモデルの平衡定数を求めるためのパラメータである。この場合、マイクロアレイの解析方法はさらに、解析対象のゲノムを断片化したサンプルと、濃度の確定したDNAとの混合液をマイクロアレイ上のプローブとハイブリダイゼーションさせるステップと、ハイブリダイゼーションによる計測プローブの蛍光強度を測定するステップと、その測定値に基づき、結合エネルギーを求めるためのパラメータ（ｗ^tf、ε^s）を求めるステップと含んでもよい。 Or, the probe is further in a state where the target is free in solution, the target is folded by self-hybridization and has a secondary structure, and the probe and the background target are specifically bound. A state may be included. At this time, the binding energy is a parameter for obtaining the equilibrium constant of the model. In this case, the microarray analysis method further comprises a step of hybridizing a mixture of a sample obtained by fragmenting the genome to be analyzed and a DNA whose concentration has been determined with a probe on the microarray, and the fluorescence intensity of the measurement probe by hybridization. And a step of ^obtaining parameters (w ^tf , ε ^s ) for ^obtaining the binding energy based on the measured value.

上記マイクロアレイの解析方法において、複数のランダムプローブの一部のプローブと、一部のプローブの塩基配列に対応した塩基配列を持ち、濃度の確定したＤＮＡとをハイブリダイゼーションさせるステップと、一部のプローブの蛍光強度を測定するステップと、一部のプローブの蛍光強度の測定値と、ＤＮＡの濃度の確定した値とに基づき、プローブの蛍光強度とターゲット濃度間の比例定数を決定するステップとをさらに含んでもよい。 In the microarray analysis method described above, a step of hybridizing a part of a plurality of random probes with a DNA having a base sequence corresponding to the base sequence of the part of the probe and having a determined concentration; Measuring the fluorescence intensity of the probe, and determining a proportionality constant between the fluorescence intensity of the probe and the target concentration based on the measured value of the fluorescence intensity of some probes and the determined value of the DNA concentration. May be included.

本発明の第２の態様において、複数のプローブが配置されたマイクロアレイからのシグナルを解析し、サンプル溶液中のターゲットの濃度を求めるマイクロアレイの解析装置が提供される。 In a second aspect of the present invention, there is provided a microarray analyzer for analyzing a signal from a microarray in which a plurality of probes are arranged to determine the concentration of a target in a sample solution.

解析装置は、解析に用いるモデルに関する情報を格納するデータ格納手段と、ターゲットを含む溶液とマイクロアレイの各プローブとをハイブリダイゼーションさせた結果得られる各プローブの蛍光強度の測定値を入力する入力手段と、モデルに関する情報を用いて、前記入力した蛍光強度の測定値からターゲット濃度（ｔ^t _i）を算出する処理手段とを備える。処理手段は、入力したランダムプローブの蛍光強度の測定値から、プローブに非特異的に結合するターゲットであるバックグラウンドターゲットの量（ｔ^ns）を求め、バックグラウンドターゲットの量と各プローブの結合エネルギーとに基づき、計測プローブの蛍光強度の測定値におけるクロスハイブリダイゼーションの影響度（NS_i）を予測し、計測プローブの蛍光強度の測定値（Ｉ^m _i）からクロスハイブリダイゼーションの影響度を除外して、解析対象とするターゲットの量（ｔ^t _i）を求める。 The analysis apparatus includes data storage means for storing information on a model used for analysis, and input means for inputting a measurement value of the fluorescence intensity of each probe obtained as a result of hybridization of a solution containing the target and each probe of the microarray. And processing means for calculating a target concentration (t ^t _i ) from the inputted measurement value of the fluorescence intensity using information on the model. The processing means obtains the amount (t ^ns ) of the background target, which is a target that non-specifically binds to the probe, from the measurement value of the fluorescence intensity of the input random probe, and the amount of the background target and the binding energy of each probe. Based on the above, the degree of influence of cross-hybridization (NS _i ) on the measurement value of fluorescence intensity of the measurement probe is predicted, and the degree of influence of cross-hybridization is excluded from the measurement value of fluorescence intensity (I ^m _i ) of the measurement probe. Thus, the amount (t ^t _i ) of the target to be analyzed is obtained.

本発明の第３の態様において、複数のプローブが配置されたマイクロアレイからのシグナルを解析し、サンプル溶液中のターゲットの濃度を求める処理をコンピュータに実行させるための解析プログラムが提供される。 In a third aspect of the present invention, there is provided an analysis program for causing a computer to execute a process of analyzing a signal from a microarray in which a plurality of probes are arranged and obtaining a target concentration in a sample solution.

解析プログラムは、ターゲットを含む溶液とマイクロアレイの各プローブとをハイブリダイゼーションさせた結果得られる各プローブの蛍光強度の測定値を入力するステップと、入力したランダムプローブの蛍光強度の測定値から、プローブに非特異的に結合するターゲットであるバックグラウンドターゲットの量（ｔ^ns）を求めるステップと、バックグラウンドターゲットの量と各プローブの結合エネルギーとに基づき、計測プローブの蛍光強度の測定値におけるクロスハイブリダイゼーションの影響度（NS_i）を予測するステップと、計測プローブの蛍光強度の測定値（Ｉ^m _i）からクロスハイブリダイゼーションの影響度を除外して、解析対象とするターゲットの量（ｔ^t _i）を求めるステップとをコンピュータに実行させる。 The analysis program inputs a measurement value of the fluorescence intensity of each probe obtained as a result of hybridization of the solution containing the target and each probe of the microarray and the measurement value of the fluorescence intensity of the input random probe to the probe. Cross-hybridization in the measurement value of the fluorescence intensity of the measurement probe based on the step of ^obtaining the amount (t ^ns ) of the background target that is a non-specific binding target and the amount of the background target and the binding energy of each probe the impact and the step of predicting (NS _i), the measurement value of the fluorescence intensity of the measuring probe (I ^m _i) to exclude the influence of cross-hybridization of the amount of target to be analyzed (t ^t _i) And causing the computer to execute

本発明によれば、解析対象のターゲットに対応しない塩基配列を持つプローブ（ランダムプローブ）の蛍光強度から、クロスハイブリダイゼーションを生じるターゲット（バックグラウンドターゲット）の量を算出し、算出したバックグラウンドターゲット量と、各プローブの結合エネルギーとに基づいて、蛍光強度におけるバックグラウンドターゲットの影響を評価し、この影響を排除した形で解析対象のターゲットの濃度を求める。この方法によれば、蛍光強度の測定値においてバックグラウンドターゲットによる影響（ノイズ）を排除できるため、真のターゲット濃度を精度よく求めることができる。特に、従来クロスハイブリダイゼーションによるノイズに埋もれて測定不能であった低い濃度のターゲットに対して、適切に評価することが可能になる。 According to the present invention, the amount of a target (background target) that causes cross-hybridization is calculated from the fluorescence intensity of a probe (random probe) having a base sequence that does not correspond to the target to be analyzed. Then, based on the binding energy of each probe, the influence of the background target on the fluorescence intensity is evaluated, and the concentration of the target to be analyzed is obtained in a form that excludes this influence. According to this method, since the influence (noise) due to the background target can be eliminated in the measured value of the fluorescence intensity, the true target concentration can be accurately obtained. In particular, it is possible to appropriately evaluate a low concentration target that has been buried in noise due to cross hybridization and cannot be measured.

以下、添付の図面を参照して本発明のマイクロアレイの解析方法及び解析装置の実施の形態を説明する。 Embodiments of a microarray analysis method and an analysis apparatus according to the present invention will be described below with reference to the accompanying drawings.

本発明のマイクロアレイの解析方法及び解析装置は、マイクロアレイのプローブの塩基配列の結合エネルギーに基づいてクロスハイブリダイゼーション（非特異的な結合）の影響の大きさを予測し、実際の蛍光強度のシグナル値から除外することで、精度よく解析対象とするターゲットの濃度を得ることを可能とする。なお、以下の説明では、目的のターゲットの塩基配列と関係なくランダムにプローブに結合してくるようなサンプル内のRNA/DNA断片を「バックグラウンドターゲット」と呼ぶ。 The microarray analysis method and analysis apparatus of the present invention predict the magnitude of the effect of cross-hybridization (non-specific binding) based on the binding energy of the base sequence of the probe of the microarray, and the actual fluorescence intensity signal value. It is possible to obtain the concentration of the target to be analyzed with high accuracy. In the following description, an RNA / DNA fragment in a sample that binds to a probe randomly regardless of the base sequence of the target of interest is referred to as a “background target”.

１．マイクロアレイ
図１に、本発明のマイクロアレイ解析方法で使用するマイクロアレイのプローブ構成を示す。マイクロアレイ１０は、計測プローブ１１、ランダムプローブ１３及びコントロールプローブ１５の３種類のプローブが含まれるよう設計されている。 1. Microarray FIG. 1 shows a probe configuration of a microarray used in the microarray analysis method of the present invention. The microarray 10 is designed to include three types of probes: a measurement probe 11, a random probe 13, and a control probe 15.

計測プローブ１１は、本来の計測に用いるための、対象となる細胞のゲノムと相補的な塩基配列をもったプローブであり、基本的に従来のマイクロアレイのプローブと同じものを用いることができる。 The measurement probe 11 is a probe having a base sequence complementary to the genome of a target cell for use in the original measurement, and basically the same probe as that of a conventional microarray can be used.

ランダムプローブ１３は、クロスハイブリダイゼーションの影響を評価するためのプローブである。ランダムプローブは、対象となる細胞のゲノム上に対応する塩基配列が含まれないように、乱数をもとに人工的に作られた配列を持つプローブである。「対象となる細胞のゲノム上に対応する配列が含まれない」ことの指標として、各ランダムプローブと、対象となる細胞のゲノムとのアライメントをとり、連続してマッチした部分配列の平均の長さがプローブ長の1/2以下となるものとする。計測するサンプル中に対応するDNA/RNAが含まれないため、このランダムプローブ１３の示す蛍光強度は完全にクロスハイブリダイゼーションの影響のみによるものと考えられる。 The random probe 13 is a probe for evaluating the influence of cross hybridization. The random probe is a probe having a sequence artificially created based on a random number so that the corresponding base sequence is not included in the genome of the target cell. Align each random probe with the genome of the target cell as an indicator that “the corresponding sequence is not included in the genome of the target cell”, and the average length of the consecutive partial sequences matched Is less than 1/2 of the probe length. Since the corresponding DNA / RNA is not contained in the sample to be measured, it is considered that the fluorescence intensity exhibited by the random probe 13 is entirely due to the influence of cross hybridization.

ランダムプローブ１３の一部がコントロールプローブ１５として割り当てられる。コントロールプローブ１５はシグナルの強度とターゲット濃度の絶対量とを比較するために使用される。すなわち、コントロールプローブに対して、対応する配列を持ったオリゴヌクレオチドを人工的に合成し、これを一定の濃度でサンプルに加えてコントロールプローブと結合させる。このときにコントロールプローブから得られるシグナルの強度と、対応する配列を持ったオリゴヌクレオチドの濃度の絶対量とを比較することで、シグナル強度とターゲット濃度の絶対量の相関を得る。 A part of the random probe 13 is assigned as the control probe 15. Control probe 15 is used to compare the intensity of the signal with the absolute amount of the target concentration. That is, an oligonucleotide having a corresponding sequence is artificially synthesized with respect to the control probe, and this is added to the sample at a constant concentration to bind to the control probe. At this time, by comparing the intensity of the signal obtained from the control probe with the absolute amount of the concentration of the oligonucleotide having the corresponding sequence, a correlation between the signal intensity and the absolute amount of the target concentration is obtained.

２．マイクロアレイ解析方法
以下、本発明のマイクロアレイ解析方法の詳細を説明する。 2. Microarray Analysis Method Hereinafter, details of the microarray analysis method of the present invention will be described.

２．１クロスハイブリダイゼーションの影響の評価
本発明では、まず、クロスハイブリダイゼーションによって結合するターゲットの量を予測する。具体的には、一種類のプローブに対して不特定多数のターゲットが部分的に結合する際の、結合量の総和を、それら不特定多数のターゲットの平均的な結合エネルギーの形でモデル化する。その上で、予備実験を行い、予備実験による実測値とモデルによる予測値との誤差を最小化するようにモデルの各種のパラメータを数値解析的手法で最適化して求める。そのパラメータをもとに各プローブへのクロスハイブリダイゼーションの影響を評価しながら、実際の測定値を補正する。 2.1 Evaluation of Cross-Hybridization Effect In the present invention, first, the amount of target bound by cross-hybridization is predicted. Specifically, the total amount of binding when an unspecified number of targets partially bind to one type of probe is modeled in the form of an average binding energy of the unspecified number of targets. . After that, a preliminary experiment is performed, and various parameters of the model are optimized by a numerical analysis method so as to minimize an error between the actually measured value of the preliminary experiment and the predicted value of the model. Based on the parameters, the actual measurement values are corrected while evaluating the influence of cross-hybridization on each probe.

２．１．１モデル
最初に、クロスハイブリダイゼーションによる影響を評価するためのモデルについて説明する。 2.1.1 Model First, a model for evaluating the influence of cross hybridization will be described.

プローブi（ｉ番目のプローブ）の塩基配列が与えられたとき、プローブiの蛍光強度を予測するためのモデルを説明する。ここでは、クロスハイブリダイゼーションの影響によるランダムプローブ１３の蛍光強度のモデルを考え、ランダムプローブ１３の塩基配列に対応する塩基配列を持つターゲットはサンプル溶液中に含まれないものとする。 A model for predicting the fluorescence intensity of the probe i when the base sequence of the probe i (i-th probe) is given will be described. Here, a model of the fluorescence intensity of the random probe 13 due to the influence of cross hybridization is considered, and a target having a base sequence corresponding to the base sequence of the random probe 13 is not included in the sample solution.

本実施形態では、プローブの取り得る状態として、図２に示すように、溶液中でフリーな状態P^fr _i（図２の（ａ））、クロスハイブリダイゼーションによってバックグラウンドターゲット（非特異的なターゲット）と非特異的に結合した状態PT^ns _i（図２の（ｃ））、及び自己ハイブリダイゼーションによって折り畳まれた二次構造をとっている状態Ｐ^fo _i（図２の（ｂ））の三つの状態を考える。そして、モデルにおいて以下の３つの平衡を考える。 In this embodiment, as shown in FIG. 2, the probe can be in a free state P ^fr _i (FIG. 2 (a)), a background target (non-specific target) by cross-hybridization. ) In a non-specifically bound state PT ^ns _i (FIG. 2 (c)) and a state P ^fo _i (FIG. 2 (b)) in a secondary structure folded by self-hybridization. Think about two states. Then, consider the following three equilibria in the model.

−プローブの自己ハイブリダイゼーションによる直鎖／折り畳み状態の間の平衡（式（１）参照）
−プローブとターゲットとのクロスハイブリダイゼーションによる一本鎖／二本鎖の間の平衡（式（２）参照）。

-Equilibrium between linear / folded states due to probe self-hybridization (see equation (1))
-Equilibrium between single strands / double strands by cross-hybridization between probe and target (see equation (2)).

上式の関係から次式が得られる。

From the relationship of the above equation, the following equation is obtained.

平衡定数Ｋ^pf _i、Ｋ^ns _iは、それぞれの反応の自由エネルギーΔＧが与えられれば、次式で求められる。

Ｒは気体定数、Ｔはハイブリダイゼーション条件の絶対温度である。 The equilibrium constants K ^pf _i and K ^ns _i can be obtained by the following equations if the free energy ΔG of each reaction is given.

R is the gas constant and T is the absolute temperature of the hybridization conditions.

マイクロアレイ１０上でサンプルとプローブが相互作用する実効的な空間の体積をvとする。プローブiの総量はｐ^tｖで表せる。ｐ^tｖはプローブの種類によらない定数であると考える。空間ｖの中でのフリー状態のプローブの実効的な濃度[Ｐ^fr _i]をｐ_iとし、プローブiと非特異的に相互作用するターゲット全体の実効的な濃度[Ｔ^ns]をｔ^nsとおく。バックグラウンドターゲットの総量ｔ^nsｖはプローブiの総量ｐ^tｖより十分に多いと考えられるので、ｔ^nsは一定と見なすと、式（３）、（４）及び質量保存則より次式が得られる。

Let v be the volume of the effective space on the microarray 10 where the sample and probe interact. The total amount of the probe i can be expressed by p ^t v. p ^t v is considered to be a constant independent of the type of probe. The effective concentration [P ^fr _i ] of the probe in the free state in the space v is p _i, and the effective concentration [T ^ns ] of the entire target that interacts non-specifically with the probe i is t ^ns . deep. Since the total amount t ^ns v background target is considered to be sufficiently larger than the total p ^t v of the probe i, if t ^ns is considered constant, equation (3), the following equation (4) and mass conservation law give It is done.

プローブの蛍光強度が、結合したターゲットの量[PT^ns _i]ｖに比例すると仮定すると、式（４）、（６）から、クロスハイブリダイゼーションによるプローブiの蛍光強度は次式で得られる。

ここで、Ｃはスケーリングの比例定数、bgは計測機器の機械的な影響によるバックグラウンドを表す。 Assuming that the fluorescence intensity of the probe is proportional to the amount of bound target [PT ^ns _i ] v, the fluorescence intensity of probe i by cross-hybridization can be obtained from the following equations from equations (4) and (6).

Here, C is a proportional constant of scaling, and bg is a background due to the mechanical influence of the measuring device.

Ｃ、ｐ^t、bgは実験に依存しない定数と考えることができる。bgの値は単純に全シグナル値の最小値であたえられ、また積Ｃ・ｐ^tはターゲットが過剰に存在し、プローブが飽和している場合の最大の蛍光強度に相当する。よって、バックグラウンドターゲットの量ｔ^nsと、プローブiのＫ^pf _i、Ｋ^ns _iが分かれば、予想される蛍光強度Ｉ^ns _iを算出することができる。 C, p ^t , and bg can be considered as constants independent of experiments. The value of bg simply given by the minimum value of all signal values, also the product C · p ^t the target is present in excess, the probe corresponds to the maximum fluorescence intensity when saturated. Therefore, if the background target amount t ^ns and the K ^pf _i and K ^ns _{i of} the probe _i are known, the expected fluorescence intensity I ^ns _i can be calculated.

２．１．２折り畳み状態の自由エネルギーの評価
折り畳み状態にあるプローブの量を評価するための平衡定数Ｋ^pf _iを計算するため、プローブの二次構造の自由エネルギーΔＧ^pf _iを求める。先行研究により、RNAやDNAの二次構造の自由エネルギーは、与えられた塩基配列からダイナミカルプログラミングの手法を用いて予測できることが示されており（D. Mathews, J. Sabina, M. Zuker, and D. Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of rna secondary structure. J. Mol. Biol., 288(5):91140, 1999.参照）、そのアルゴリズムをもとにした予測プログラムMFOLD1が公開されている。本発明ではこのプログラムを用いて各プローブのDNAの二次構造を予想し、その自由エネルギーから求めた平衡定数をプローブの二次構造の影響の評価に用いるものとする。ただし、MFOLD1で用いられるモデルは、溶液中で自由なDNA分子を対象とした実験に基づいており、マイクロアレイのプローブのような固体表面上に固定された分子とは条件が異なる、このため、本発明では、MFOLD1で求めた自由エネルギーに、重み付けの係数ｗ^pfを乗じた値を実際の評価に用いている。
評価に用いる自由エネルギーΔＧ^pf _i
＝（ＭＦＯＬＤ１で求めたプローブの二次構造の自由エネルギー）×ｗ^pf 2.1.2 Evaluation of the free energy in the folded state In order to calculate the equilibrium constant K ^pf _i for evaluating the amount of the probe in the folded state, the free energy ΔG ^pf _i of the secondary structure of the probe is obtained. Previous studies have shown that the free energy of RNA and DNA secondary structures can be predicted from a given base sequence using a dynamic programming technique (D. Mathews, J. Sabina, M. Zuker, and D. Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of rna secondary structure. J. Mol. Biol., 288 (5): 91140, 1999.), a prediction program MFOLD1 based on that algorithm was released Yes. In the present invention, this program is used to predict the secondary structure of the DNA of each probe, and the equilibrium constant obtained from the free energy is used to evaluate the influence of the secondary structure of the probe. However, the model used in MFOLD1 is based on experiments with free DNA molecules in solution, and the conditions differ from molecules immobilized on solid surfaces such as microarray probes. In the present invention, a value obtained by multiplying the free energy obtained by MFOLD1 by a weighting coefficient w ^pf is used for actual evaluation.
Free energy ΔG ^pf _i used for evaluation
= (Free energy of the secondary structure of the probe determined by MFOLD1) × w ^pf

２．１．３非特異的Nearest Neighbor モデル
非特異的に結合するプローブの量を評価するための平衡定数Ｋ^ns _iを計算するため、プローブ／バックグラウンドターゲット間のクロスハイブリダイゼーションの結合エネルギーΔＧ^ns _iを求める。 2.1.3 Nonspecific Nearest Neighbor Model In order to calculate the equilibrium constant K ^ns _i for assessing the amount of non-specifically bound probe, the binding energy ΔG ^{ns of the} probe / background target cross-hybridization _{Find i} .

先行研究により、プローブ／ターゲット間のハイブリダイゼーションの結合エネルギーは、プローブの塩基配列からNearest Neighbor(NN) モデルと呼ばれる計算法により予測できることが知られている（J. SantaLucia. A unified view of polymer, dumbbell, and oligonucleotide dna nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA, 95(4):14605, 1998. 参照）。本発明ではこのモデルの考え方を拡張し、プローブと不特定多数のターゲットとが部分的に結合するその総量を、それらの平均的な結合エネルギーの形で近似的に求める。 According to previous studies, it is known that the binding energy of hybridization between a probe and a target can be predicted from a probe base sequence by a calculation method called a Nearest Neighbor (NN) model (J. SantaLucia. A unified view of polymer, Dumbbell, and oligonucleotide dna nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA, 95 (4): 14605, 1998.). In the present invention, the concept of this model is expanded, and the total amount of partial binding between the probe and an unspecified number of targets is approximately obtained in the form of their average binding energy.

本実施形態ではプローブの各塩基毎の非特異的結合エネルギーε^nsをその塩基及びその前後の塩基の組み合わせに依存した量として考え、その総和で全体の結合エネルギーΔＧが求められるものとして計算する。プローブにおけるk番目の塩基の種類がb_kで与えられているとすると、結合エネルギーΔＧ^nsは次式で得られる。

In this embodiment, the non-specific binding energy ε ^ns for each base of the probe is considered as an amount depending on the combination of the base and the bases before and after the base, and the total binding energy ΔG is calculated as the sum. If the type of the k-th base in the probe is given by b _k , the binding energy ΔG ^ns is obtained by the following equation.

例えば塩基配列"ATGCTTCG"が与えられた場合、そのプローブの平均的結合エネルギーは次式のように計算する。

For example, when the base sequence “ATGCTTCG” is given, the average binding energy of the probe is calculated as follows.

三つの塩基組み合わせは、全ての組み合わせ６４通りのうち"ATG"と"CAT"のように反転させることで同じ組み合わせと見なせる配列を同一とすると３２通りとなる。これに加えて"AT"のように配列の末端が含まれる場合を別の組み合わせとして考えると、必要なパラメータε^nsは４８個となる。 There are 32 combinations of the three base combinations when the sequences that can be regarded as the same combination are inverted by reversing the combinations such as “ATG” and “CAT” out of all 64 combinations. In addition to this, when considering the case where the end of the sequence is included as in “AT” as another combination, the required parameter ε ^ns is 48.

２．２クロスハイブリダイゼーションの影響を排除したシグナルの評価
次に、前述のクロスハイブリダイゼーションの評価モデルをもとに、実際の測定における蛍光強度からクロスハイブリダイゼーションの影響を排除したシグナル（ターゲットのハイブリダイゼーションによる蛍光強度の測定値）を求めるためのモデルを説明する。 2.2 Evaluation of Signal Excluding Cross-Hybridization Effect Next, based on the above-described cross-hybridization evaluation model, a signal (target high-frequency signal) excluding the influence of cross-hybridization from the fluorescence intensity in the actual measurement. A model for obtaining a fluorescence intensity measurement value by hybridization will be described.

２．２．１モデル
プローブiの配列に対応するターゲットがサンプル中に含まれるとき、その蛍光強度をクロスハイブリダイゼーションの影響を含めて予測するモデルを構成する。モデルにおいて以下の４つの平衡を考える。 2.2.1 When a target corresponding to the sequence of the model probe i is included in the sample, a model that predicts the fluorescence intensity including the influence of cross-hybridization is constructed. Consider the following four equilibria in the model:

−プローブの自己ハイブリダイゼーションによる、直鎖（図２の（ａ））／折り畳み状態（図２の（ｂ））の間の平衡（式（１０）参照）
−ターゲットの自己ハイブリダイゼーションによる、直鎖（図２の（ｆ））／折り畳み状態（図２の（ｅ））の間の平衡（式（１１）参照）
−プローブと目的のターゲットとの特異的ハイブリダイゼーションによる、一本鎖（図２の（ａ）、（ｆ））／二本鎖（図２の（ｄ））の間の平衡（式（１２）参照）
−プローブとバックグラウンドターゲットとのクロスハイブリダイゼーションによる、一本鎖（図２の（ａ）、（ｇ））／二本鎖（図２の（ｃ））の間の平衡（式（１３）参照）

-Equilibrium between linear (Fig. 2 (a)) / folded state (Fig. 2 (b)) by probe self-hybridization (see equation (10))
-Equilibrium between linear (figure 2 (f)) / folded state (figure 2 (e)) by target self-hybridization (see equation (11))
-Equilibrium between single-stranded (Fig. 2 (a), (f)) / double-stranded (Fig. 2 (d)) by specific hybridization of the probe with the target of interest (formula (12) reference)
-Equilibrium between single strand (Fig. 2 (a), (g)) / double strand (Fig. 2 (c)) by cross-hybridization between probe and background target (see formula (13)) )

これらの状態遷移のモデルより次の関係が得られる。

The following relationship is obtained from these state transition models.

２．１節と同様、マイクロアレイ１０上でサンプルとプローブが相互作用する実効的な空間の体積をｖ、プローブiの総量をｐ^tｖで表せるとし、ｐ^tｖはプローブの種類によらない定数であると考える。空間ｖの中でのフリー状態のプローブの実効的な濃度[Ｐ^fr _i]をｐ_iとおき、プローブiと非特異的に相互作用するターゲット全体の実効的な濃度[Ｔ^ns]をｔ^nsとおく。投入したサンプル中のプローブiと特異的に相互作用するターゲットの濃度をｔ^t _iとし、溶液中でフリーなターゲットの濃度をｔ^fr _iで表すとすると、式（１４）〜（１７）と質量保存則より次の関係が成り立つ。

ただし、バックグラウンドターゲットの総量ｔ^nsｖはプローブiの総量ｐ^tｖより十分に多いと考えられるのでｔ^nsは一定と見なしている。プローブの蛍光強度Ｉ^s _iは、特異的に結合したターゲットの量とクロスハイブリダイゼーションの量の和([ＰＴ^s _i]+[ＰＴ^ns _i])ｖに比例すると考えると、実効的な平衡定数Ｋ^ef _iを用いて式（２１）〜式（２３）で得られる。

Similar to section 2.1, the volume of the effective space sample and probe to interact on the microarray 10 v, the total amount of probe i and expressed by p ^{^t} v, p ^t v does not depend on the type of probe constants I believe that. The effective concentration of the probe in the free state [P ^fr _i] of in space v p _i Distant, effective concentration of the entire target that interact probe i and nonspecific [T ^ns] a t ^ns far. The concentration of the charged target that interact specifically with probe i in the sample and t ^t _i, when the concentration of free target in solution and expressed by t ^fr _i, the equation (14) to (17) Weight The following relation holds from the law of conservation.

However, the total amount t ^ns v of the background target is t ^ns since it is considered to be sufficiently larger than the total amount p ^t v of the probe i is regarded as a constant. Considering that the fluorescence intensity I ^s _i of the probe is proportional to the sum of the amount of specifically bound target and the amount of cross-hybridization ([PT ^s _i ] + [PT ^ns _i ]) v, an effective equilibrium constant Using K ^ef _i , it can be obtained from formula (21) to formula (23).

よって、バックグラウンドターゲットの量ｔ^ns、計測の目的となるプローブと相補的なターゲットの量ｔ^t _i、とプローブiのＫ^pf _i、Ｋ^tf _i、Ｋ^ns _i、Ｋ^s _iが分かれば、特異的ハイブリダイゼーションとクロスハイブリダイゼーションを合わせた蛍光強度Ｉ^s _iを予想することができる。逆に、ｔ^t _i以外のパラメータの値と実際の蛍光強度Ｉ^s _iが与えられれば、ターゲットの量ｔ^t _i、すなわち、サンプル中のターゲットの濃度を求めることができる。 Therefore, the amount t ^ns background target amount t ^t _i complementary target and probe are the object of measurement, and K ^pf _i probe ^{_{^{i, K tf i, K ns}}} i, if K ^s _i are known, it is possible to expect the fluorescence intensity I ^s _i tailored specific hybridization and cross-hybridization. Conversely, given the t ^t _i other than the value of the parameter and the actual fluorescence intensity I ^s _i, the amount of target t ^t _i, i.e., it is possible to determine the concentration of target in the sample.

そこで、本発明では、予備実験を行ってｔ^t _i以外のパラメータの値を事前に決定しておき、その後、蛍光強度の実測値Ｉ^s _iから、決定したパラメータを用いてサンプル中のターゲットの濃度を求める。このようにして求めたターゲットの濃度はクロスハイブリダイゼーションの影響を排除して求められるため、精度のよい値となる。 Therefore, in the present invention, in advance to determine the values of the parameters other than t ^t _i in advance by performing a preliminary experiment, then, from the measured value I ^s _i of the fluorescence intensity of the target in the sample by using the determined parameter Determine the concentration. Since the target concentration thus determined is determined by eliminating the influence of cross hybridization, it is a highly accurate value.

２．２．２折り畳みの自由エネルギーの評価
平衡定数Ｋ^tf _i、を計算するため、ターゲットの二次構造の自由エネルギーΔＧ^tf _iを求める。ターゲットとなるDNA分子は配列がプローブと相補的であるため、その二次構造は基本的にプローブの二次構造の取り方とほぼ同じとなることが予想される。ただし、実際のターゲットは長さが様々に異なる塩基配列の断片である点、溶液中で自由に運動できる点、などの条件がプローブとは異なる。そのため推定の元となるデータとしては前節と同じくプローブの配列からMFOLD1を用いて計算した二次構造の自由エネルギーの値を用いるが、重み付けの係数をｗ^pfの代わりにｗ^tfとしたものをターゲットの二次構造の自由エネルギーとする。
評価に用いる自由エネルギーΔＧ^tf _i
＝（ＭＦＯＬＤ１で求めたターゲットの二次構造の自由エネルギー）×ｗ^tf Evaluation equilibrium constant 2.2.2 folding free energy K ^tf _i, to calculate the obtained free energy .DELTA.G ^tf _i of the secondary structure of the target. Since the target DNA molecule is complementary in sequence to the probe, its secondary structure is expected to be essentially the same as the secondary structure of the probe. However, the actual target is different from the probe in that the target is a fragment of a base sequence with different lengths and that it can move freely in solution. Therefore, as the data for estimation, the free energy value of the secondary structure calculated using MFOLD1 from the probe array is used as in the previous section, but the weighting coefficient is set to w ^tf instead of w ^pf as the target. The free energy of the secondary structure of
Free energy ΔG ^tf _i used for evaluation
= (Free energy of secondary structure of target obtained by MFOLD1) × w ^tf

２．２．３ Nearest Neighbor モデル
非特異的結合の場合と同様、特異的結合によるハイブリダイゼーションの自由エネルギーは配列の組み合わせから近似的に求められるものと考える。前節でのε^nsとは別にもう１組、４８個のパラメータε^sを用意し、前節と同じように各塩基対からの寄与の総和として計算する。

2.2.3 Nearest Neighbor Model Similar to the case of non-specific binding, the free energy of hybridization due to specific binding can be estimated approximately from the combination of sequences. In addition to ε ^ns in the previous section, another set of 48 parameters ε ^s is prepared, and the sum of contributions from each base pair is calculated as in the previous section.

２．３パラメータの決定手法
以上のモデルにおけるパラメータのうちｔ^nsを除いたｗ^pf、ｗ^tf、ε^s及びε^nsの値は実験に依存しない定数と考えられる。このため、予備実験として以下の実験１、２を行うことによりこれらのパラメータの値を事前に決定する。以下パラメータの決定手法について説明する。 2.3 Parameter Determination Method Of the parameters in the above model, the values of w ^pf , w ^tf , ε ^s and ε ^ns excluding t ^ns are considered to be constants independent of experiments. For this reason, the values of these parameters are determined in advance by performing the following experiments 1 and 2 as a preliminary experiment. The parameter determination method will be described below.

＜実験１＞
最初に、マイクロアレイ１０を用いて目的の細胞のゲノムを断片化したサンプルのみをハイブリダイゼーションさせ、その結果得られるランダムプローブ１３の蛍光強度に基づき、クロスハイブリダイゼーションのモデルにおけるパラメータ（ｗ^pf、ε^ns）を決定する。 <Experiment 1>
First, only a sample obtained by fragmenting the genome of a target cell using the microarray 10 is hybridized, and the parameters (w ^pf , ε ^ns) in the cross-hybridization model are based on the fluorescence intensity of the random probe 13 obtained as a result. ).

実験に用いるマイクロアレイ１０に含まれるランダムプローブ１３の本数をＮ^rとする。ランダムプローブ１３を含むマイクロアレイ１０を用いて目的となる細胞のサンプルを測定し、その測定で得られたランダムプローブ１３の蛍光強度をＩ^r _i (1 ≦ i ≦ Ｎ^r)とする。このＩ^r _iと、モデルによる予想値Ｉ^ns _i（式（７）参照）との誤差Ｒ^rを考える。なお、モデルによる予想値Ｉ^ns _iは式（７）で得られるが、式（７）において比例定数Ｃには仮の値を設定する。

The number of random probe 13 included in the microarray 10 used in the experiment and N ^r. A sample of a target cell is measured using the microarray 10 including the random probe 13, and the fluorescence intensity of the random probe 13 obtained by the measurement is defined as I ^r _i (1 ≦ i ≦ N ^r ). Consider an error R ^r between this I ^r _i and the expected value I ^ns _i (see equation (7)). Note that the expected value I ^ns _i based on the model is obtained by Equation (7), but a temporary value is set for the proportionality constant C in Equation (7).

そして、誤差Ｒ^rが最小となるような、ｔ^ns、ｗ^pf、及び４８個のε^nsの値をニュートン法、準ニュートン法、Nelder-Mead法等の数値的最適化の手法を用いて決定する。具体的には以下のプロセスを行う。
（１）ｔ^ns、ｗ^pf及びε^nsに適当な初期値を与え、Ｒ^rを計算する。
（２）そして、Ｒ^rが減少するようにそれぞれのパラメータの値を変更する。
（３）再びＲ^rを計算する。
パラメータを変更しながら、誤差Ｒ^rが０に近い所定値以下になるまで（２）、（３）のプロセスをくり返し、最終的に収束した値を各パラメータの値とする。 Then, t ^ns , w ^pf , and 48 values of ε ^ns that minimize the error R ^r are determined using numerical optimization techniques such as Newton's method, quasi-Newton's method, and Nelder-Mead's method. To do. Specifically, the following process is performed.
(1) Appropriate initial values are given to t ^ns , w ^pf and ε ^ns , and R ^r is calculated.
(2) Then, the value of each parameter is changed so that R ^r decreases.
(3) Calculate R ^r again.
While changing the parameters, the processes (2) and (3) are repeated until the error R ^r becomes equal to or less than a predetermined value close to 0, and finally converged values are used as the values of the respective parameters.

以上により、パラメータｗ^pf、及びε^nsが求まる。 Thus, the parameters w ^pf and ε ^ns are obtained.

＜実験２＞
次に、マイクロアレイに目的の細胞のゲノムを断片化したサンプルと、コントロールプローブ１５に対する濃度の確定したDNA（以下「コントロール用サンプル」という。）とを混合した溶液をハイブリダイゼーションさせ、パラメータｗ^tfとε^sの値を最適化する。 <Experiment 2>
Next, a solution in which a sample obtained by fragmenting the genome of a target cell in a microarray and a DNA whose concentration with respect to the control probe 15 has been determined (hereinafter referred to as “control sample”) is hybridized, and parameter w ^tf and Optimize the value of ε ^s .

（１）まず、実験２の結果得られるランダムプローブ１３の蛍光強度の実測値に基づきバックグランドターゲットの濃度を見積る。具体的には、実験１で求めたパラメータｗ^pf、及びε^nsをもとに各プローブのＫ^pf _i、Ｋ^ns _iを計算し、さらに、式（２５）から求まるＲ^rが最小になるように、ｔ^nsの値を数値的最適化によって求める。このようにして求めたｔ^nsの値をバックグラウンドダーゲットの実効的な総量とする。 (1) First, the concentration of the background target is estimated based on the measured value of the fluorescence intensity of the random probe 13 obtained as a result of Experiment 2. Specifically, K ^pf _i and K ^ns _i of each probe are calculated based on the parameters w ^pf and ε ^ns obtained in Experiment 1, and R ^r obtained from the equation (25) is minimized. Then, the value of t ^ns is obtained by numerical optimization. The value of t ^ns obtained in this way is set as the effective total amount of the background target.

（２）細胞のゲノムを断片化したものの濃度を測定しているため、すべての測定プローブに対応するターゲットの濃度ｔ^t _iは一定であると仮定することができる。実験１及び実験２の（１）で求めたパラメータの値を用いて最適なｔ^t _i、ｗ^tf、ε^sの値を求める。具体的には、次式（２６）を用いて、蛍光強度の実測値Ｉ_iと予想値Ｉ^s _iとの誤差Ｒ^sが最小となるようにｔ^t _i、ｗ^tf、ε^sの値を最適化する。なお、予想値Ｉ^s _iは式（２３）により与えられる。

(2) Since the concentration of the fragmented cell genome is measured, it can be assumed that the target concentrations t ^t _i corresponding to all measurement probes are constant. Optimal t ^t _i using the values of parameters obtained in the Experiments 1 and 2 (1), w ^tf, obtaining the value of epsilon ^s. Specifically, using the following equation (26), the error R ^s is t ^t _i so as to minimize the expected value I ^s _i and measured values I _i of the fluorescence intensity, w ^tf, the value of epsilon ^s Optimize. The expected value I ^s _i is given by the equation (23).

以上により、バックグラウンドダーゲットの量ｔ^ns、並びにパラメータｗ^tf、及びε^sが求まる。 As described above, the amount t ^{ns of the} background target and the parameters w ^tf and ε ^s are obtained.

（３）最後に、最適化により求めたパラメータｗ^tf、ε^sと、コントロールプローブ１５の蛍光強度の測定値とから、コントロール用サンプルの濃度を計算する。計算した濃度と、実際に投入したコントロール用サンプルの濃度とが等しくなるように、スケーリングの比例定数Ｃの値を決定する。 (3) Finally, the concentration of the control sample is calculated from the parameters w ^tf and ε ^s obtained by optimization and the measured fluorescence intensity of the control probe 15. The value of the scaling proportional constant C is determined so that the calculated concentration is equal to the concentration of the actually supplied control sample.

２．４ターゲット濃度の評価
以上のようにしてパラメータを決定した後、実際にマイクロアレイの蛍光強度を測定し、その実測値Ｉ^m _iから上記パラメータを用いて、解析対象とするターゲットの量ｔ^t _iすなわちサンプル中のターゲットの濃度を求める。具体的には以下のとおりである。 2.4 Evaluation of Target Concentration After the parameters are determined as described above, the fluorescence intensity of the microarray is actually measured, and the target amount t ^t to be analyzed using the above parameters from the actually measured value I ^m _i. _i, ie, the concentration of the target in the sample. Specifically, it is as follows.

（１）既に求めたパラメータｗ^pf、ε^ns等に基づき各プローブの結合エネルギーΔＧ^pf _i、ΔＧ^ns _i、ΔＧ^tf _i、ΔＧ^s _iを計算する。それらの結合エネルギーから平衡定数Ｋ^pf _i、Ｋ^ns _i、Ｋ^tf _i、Ｋ^s _iを求める。 (1) already determined parameters w ^pf, binding energy .DELTA.G ^pf _i of each probe based on the epsilon ^ns ^{_{^{_{like, ΔG ns i, ΔG tf i}}}} , computes the .DELTA.G ^s _i. Equilibrium constants K ^pf _i , K ^ns _i , K ^tf _i , and K ^s _i are obtained from their binding energies.

（２）サンプル中に対応する配列を持たないランダムプローブ１３の蛍光強度の測定値から、式（７）を用いて、バックグラウンドとして非特異的に各プローブに結合してくるバックグラウンドターゲットの量ｔ^nsを求める。 (2) The amount of the background target that binds to each probe nonspecifically as a background from the measured value of the fluorescence intensity of the random probe 13 that does not have a corresponding sequence in the sample, using Equation (7). ^tns is obtained.

（３）各計測プローブ１１の蛍光強度の測定値Ｉ^m _iから、バックグラウンドターゲットの量ｔ^nsと上記で算出した各値とを用いて、ターゲットの濃度ｔ^t _iの値を求める。具体的には、式（２６）において、各計測プローブ１１の蛍光強度の測定値Ｉ^m _iと、目的のターゲットによる蛍光強度の理論値Ｉ^s _iとの差が最小となるように、ターゲットの濃度ｔ^t _iの値を数値解析的に求める。目的のターゲットによる蛍光強度の理論値Ｉ^s _iは、バックグラウンドのターゲット量ｔ^ns、ステップＳ３２で求めた平衡定数、及びその他の既知の値（比例定数Ｃ等）とを用いて式（２０）〜（２３）から算出する。 (3) From the measured value I ^m _i of the fluorescence intensity of each measurement probe 11, the value of the target concentration t ^t _i is obtained using the amount t ^{ns of the} background target and each value calculated above. Specifically, in the equation (26), the target values are set so that the difference between the measured value I ^m _i of the fluorescence intensity of each measurement probe 11 and the theoretical value I ^s _i of the fluorescence intensity of the target target is minimized. concentration values t ^t _i numerically analytically determined. The theoretical value I ^s _i of the fluorescence intensity by the target is obtained by using the background target amount t ^ns , the equilibrium constant obtained in step S32, and other known values (proportional constant C etc.). Calculated from (23).

このようにして求めたターゲットの濃度ｔ^t _iの値は、計測プローブ１１のシグナルＩ^m _iから、バックグラウンドターゲットの影響（ＮＳ_i）すなわちノイズ成分を排除したものとなり、目的のターゲットと計測プローブ１１の特異的な結合のみに依存する精度のよい値となる。 The target concentration t ^t _i obtained in this way is obtained by eliminating the influence (NS _i ) of the background target, that is, the noise component, from the signal I ^m _i of the measurement probe 11. It is a highly accurate value that depends only on 11 specific bindings.

３．マイクロアレイ解析システム
３．１システム構成
上記のマイクロアレイ解析方法を適用したマイクロアレイ解析システムについて説明する。図１に本発明のマイクロアレイ解析システムの構成例を示す。マイクロアレイ解析システム１００は、マイクロアレイ１０の各プローブの蛍光状態を読み取り、その画像情報を電気信号（シグナル）として出力するスキャナ２０と、スキャナ２０からの画像情報を処理する解析装置５０とを含む。 3. Microarray analysis system
3.1 System Configuration A microarray analysis system to which the above microarray analysis method is applied will be described. FIG. 1 shows a configuration example of the microarray analysis system of the present invention. The microarray analysis system 100 includes a scanner 20 that reads the fluorescence state of each probe of the microarray 10 and outputs the image information as an electrical signal (signal), and an analysis device 50 that processes the image information from the scanner 20.

解析装置５０は、測定データや解析結果等を画面に表示する表示部５１と、所定のデータを格納するデータ格納部５３と、制御プログラムにしたがい解析処理を実行するプロセッサ５５と、使用者が解析装置の操作を行うための操作部５７と、スキャナ２０から画像情報を入力するスキャナインタフェース５８と、外部機器等とデータのやりとりを行う外部インタフェース５９とを含む。 The analysis device 50 includes a display unit 51 that displays measurement data, analysis results, and the like on a screen, a data storage unit 53 that stores predetermined data, a processor 55 that executes analysis processing according to a control program, and a user that performs analysis. An operation unit 57 for operating the apparatus, a scanner interface 58 for inputting image information from the scanner 20, and an external interface 59 for exchanging data with an external device or the like are included.

表示部５１は液晶表示装置（LCD）等で構成され、入力データや解析結果の表示が可能である。プロセッサ５５は例えばMPUやCPUであり、所定のプログラムを実行することで解析装置５０全体の動作を制御する。操作部５７は、操作ボタン、スイッチ、キーボート、マウス等で構成され、使用者は操作部５７を介して解析装置５０の操作を行う。外部インタフェース５９は、解析装置と、ネットワークやプリンタ等の外部機器との間でデータの送受信を可能とするインタフェースである。解析装置５０は例えばパーソナルコンピュータで構成される。スキャナ２０と解析装置５０は一体化されて構成されてもよい。 The display unit 51 is composed of a liquid crystal display (LCD) or the like, and can display input data and analysis results. The processor 55 is, for example, an MPU or CPU, and controls the operation of the entire analysis apparatus 50 by executing a predetermined program. The operation unit 57 includes operation buttons, switches, a keyboard, a mouse, and the like, and the user operates the analysis device 50 via the operation unit 57. The external interface 59 is an interface that enables data transmission / reception between the analysis apparatus and an external device such as a network or a printer. The analysis device 50 is constituted by a personal computer, for example. The scanner 20 and the analysis device 50 may be integrated.

データ格納部５３はハードディスク、フラッシュメモリ等の半導体記憶装置、CD、DVD等の記録媒体で構成され、測定データ、解析結果、制御プログラム等、解析装置５０の処理に必要なデータを格納する。また、データ格納部５３は、モデル情報として前述の解析に使用するモデルに関する数式の情報やパラメータを格納する。 The data storage unit 53 includes a semiconductor storage device such as a hard disk and a flash memory, and a recording medium such as a CD and a DVD, and stores data necessary for processing of the analysis device 50 such as measurement data, analysis results, and control programs. The data storage unit 53 stores mathematical formula information and parameters relating to the model used for the above-described analysis as model information.

３．２処理フロー
マイクロアレイ解析システム１００の具体的な動作について説明する。 3.2 Processing Flow A specific operation of the microarray analysis system 100 will be described.

３．２．１パラメータ決定処理
図４のフローチャートを参照し、マイクロアレイ解析システム１００のパラメータ決定処理を説明する。 3.2.1 Parameter Determination Processing Parameter determination processing of the microarray analysis system 100 will be described with reference to the flowchart of FIG.

まず、目的の細胞のゲノムを断片化したサンプルのみを、マイクロアレイ１０のプローブとハイブリダイゼーションさせる（実験１）。 First, only the sample obtained by fragmenting the genome of the target cell is hybridized with the probe of the microarray 10 (Experiment 1).

スキャナ２０によりその実験１によるランダムプローブ１３の蛍光強度が読み込まれる。プロセッサ５０は、蛍光強度の測定値を電気信号としてスキャナ２０から入力する（Ｓ１１）。入力した蛍光強度の測定値はデータ格納部５３に格納される。 The fluorescence intensity of the random probe 13 according to Experiment 1 is read by the scanner 20. The processor 50 inputs the measured value of the fluorescence intensity from the scanner 20 as an electrical signal (S11). The input measurement value of the fluorescence intensity is stored in the data storage unit 53.

プロセッサ５０は、入力したランダムプローブ１３の蛍光強度の測定値に基づき、式（２５）を用いた数値的最適化の手法によりパラメータ（ｗ^pf、ε^ns）を決定する（Ｓ１２）。 The processor 50 determines parameters (w ^pf , ε ^ns ) by a numerical optimization technique using the equation (25) based on the input measurement value of the fluorescence intensity of the random probe 13 (S12).

次に、目的の細胞のゲノムを断片化したサンプルと、コントロールプローブ１５に対する、濃度の確定したDNA（コントロール用サンプル）とを混合した溶液を、マイクロアレイ１０のプローブとハイブリダイゼーションさせる（実験２）。 Next, a solution in which a sample obtained by fragmenting the genome of a target cell and a DNA (control sample) whose concentration is determined with respect to the control probe 15 is mixed with the probe of the microarray 10 (Experiment 2).

プロセッサ５５はその実験２による蛍光強度の測定値をスキャナ２０から入力し（Ｓ１３）、入力したランダムプローブ１３の蛍光強度の測定値と既に求めたパラメータに基づき、式（２５）を用いてバックグラウンドターゲットの濃度ｔ^nsを求める（Ｓ１４）。 The processor 55 inputs the measurement value of the fluorescence intensity according to the experiment 2 from the scanner 20 (S13), and based on the input measurement value of the fluorescence intensity of the random probe 13 and the already obtained parameter, the background is obtained using the equation (25). The target concentration t ^ns is obtained (S14).

プロセッサ５５は、測定した計測プローブ１１の蛍光強度と既に求めたパラメータに基づき、式（２５）を用いてパラメータ（ｗ^tf、ε^s）を決定する（Ｓ１５）。最後に、コントロールプローブ１５の蛍光強度の測定値と、コントロール用サンプルの濃度の測定値と理論値とに基づき比例定数Ｃを決定する（Ｓ１６）。以上のようにしてモデルに使用する各パラメータが決定され、データ格納部５３に格納される。 The processor 55 determines parameters (w ^tf , ε ^s ) using the equation (25) based on the measured fluorescence intensity of the measurement probe 11 and the parameters already obtained (S15). Finally, the proportionality constant C is determined based on the measured value of the fluorescence intensity of the control probe 15, the measured value of the concentration of the control sample, and the theoretical value (S16). As described above, each parameter used for the model is determined and stored in the data storage unit 53.

３．２．２濃度解析処理
図５のフローチャートを参照し、マイクロアレイ解析システム１００の濃度解析処理を説明する。 3.2.2 Concentration Analysis Processing The concentration analysis processing of the microarray analysis system 100 will be described with reference to the flowchart of FIG.

目的の細胞のゲノムを断片化したサンプルをマイクロアレイ１０のプローブとハイブリダイゼーションさせる。 A sample obtained by fragmenting the genome of a target cell is hybridized with a probe of the microarray 10.

プロセッサ５５は、そのハイブリダイゼーションによるプローブの蛍光強度の測定値をスキャナ２０から入力する（Ｓ３１）。プロセッサ５５は、データ格納部５３に格納されたパラメータ（ｗ^pf、ε^ns等）に基づき各プローブの結合エネルギーΔＧ^pf _i、ΔＧ^ns _i、ΔＧ^tf _i、ΔＧ^s _iを計算する（Ｓ３２）。このとき、それらの結合エネルギーから平衡定数Ｋ^pf _i、Ｋ^ns _i、Ｋ^tf _i、Ｋ^s _iを求め、データ格納部５３に格納しておく。 The processor 55 inputs the measurement value of the fluorescence intensity of the probe resulting from the hybridization from the scanner 20 (S31). Processor 55, the parameters stored in the data storage unit 53 (w ^pf, ε ^ns, etc.) binding energy .DELTA.G ^pf _i of each probe on the basis ^{_{^{_{of, ΔG ns i, ΔG tf i}}}} , calculate the ΔG ^s _i (S32). At this time, the equilibrium constants K ^pf _i , K ^ns _i , K ^tf _i , and K ^s _i are obtained from these binding energies and stored in the data storage unit 53.

続いて、プロセッサ５５は、式（７）を用いて、サンプル中に対応する配列を持たないランダムプローブ１３の蛍光強度の測定値Ｉ^r _iから、バックグラウンドとして非特異的に各プローブに結合してくるバックグラウンドターゲットの量ｔ^nsを求める（Ｓ３３）。 Subsequently, the processor 55 binds to each probe non-specifically as a background from the measured value I ^r _i of the fluorescence intensity of the random probe 13 that does not have a corresponding sequence in the sample, using Equation (7). The amount t ^{ns of the} background target coming is obtained (S33).

プロセッサ５５は、各計測プローブ１１の蛍光強度の測定値Ｉ^m _iと、上記で算出した各値とに基づきターゲットの濃度ｔ^t _iの値を求める。（Ｓ３４）。具体的には、式（２６）において、各計測プローブ１１の蛍光強度の測定値Ｉ^m _iと、目的のターゲットによる蛍光強度の理論値Ｉ^s _iとの差が最小となるように、ターゲットの濃度ｔ^t _iの値を数値解析的に求める。目的のターゲットによる蛍光強度の理論値Ｉ^s _iは、バックグラウンドのターゲット量ｔ^ns、ステップＳ３２で求めた平衡定数及びその他の既知の値（比例定数Ｃ等）とを用いて式（２０）〜（２３）から算出する。 The processor 55 calculates the measured value I ^m _i of the fluorescence intensity of each measurement probe 11, the value of the concentration t ^t _i of the target based on the values calculated above. (S34). Specifically, in the equation (26), the target values are set so that the difference between the measured value I ^m _i of the fluorescence intensity of each measurement probe 11 and the theoretical value I ^s _i of the fluorescence intensity of the target target is minimized. concentration values t ^t _i numerically analytically determined. The theoretical value I ^s _i of the fluorescence intensity by the target is calculated by using the background target amount t ^ns , the equilibrium constant obtained in step S32 and other known values (proportional constant C, etc.) Calculate from (23).

求められたターゲットの濃度は表示部５１上に表示されたり、外部インタフェース５９を介してデータとして出力されたりすることができる。 The obtained target concentration can be displayed on the display unit 51 or output as data via the external interface 59.

以下、本発明のマイクロアレイ解析方法を用いた具体的な実施例を説明する。 Specific examples using the microarray analysis method of the present invention will be described below.

４．１マイクロアレイの設計
本実施例では、大腸菌(Escherichia coli)のゲノムをもとに設計されたカスタムアレイを用いた。このマイクロアレイには大きく分けて次の２種類のプローブが載せられている。
１）大腸菌の遺伝子(glnA, 1410bp)を14merステップでカバーした計測プローブ
２）人工的に合成した塩基配列をもとにしたランダムプローブ 4.1 Design of Microarray In this example, a custom array designed based on the genome of Escherichia coli was used. This microarray is roughly divided into the following two types of probes.
1) Measurement probe covering E. coli gene (glnA, 1410bp) with 14mer step 2) Random probe based on artificially synthesized base sequence

４．２ランダムプローブの作成
このマイクロアレイに用いたランダムプローブの設計の手順を説明する。
１）Ａ、Ｔ、Ｇ、Ｃのそれぞれのアルファベットが確率１／４で現れるような長さ２５の文字列を１０００種類、乱数を用いて生成する。 4.2 Creation of Random Probe The procedure for designing the random probe used in this microarray will be described.
1) Generate a character string of length 25 such that each alphabet of A, T, G, and C appears with a probability of ¼ using random numbers.

２）先行研究（J. SantaLucia. A unified view of polymer, dumbbell, and oligonucleotide dna nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA, 95(4):14605, 1998. 参照）のNearest Neighborモデルを用いて、各配列のハイブリダイゼーションの自由エネルギーを計算する。 2) Nearest Neighbor model of previous research (see J. SantaLucia. A unified view of polymer, dumbbell, and oligonucleotide dna nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA, 95 (4): 14605, 1998.) Is used to calculate the free energy of hybridization of each sequence.

３）１）で生成した配列の中から、２）で計算した自由エネルギーが、-28±0.5 kcal/mol, -32±0.5 kcal/mol, -36±0.5 kcal/molの三つの範囲に含まれるものをそれぞれ50本ずつ、計150本選び出す。もし、１つの範囲において、選び出した配列が50本に満たなければ、1)に戻って配列を追加する。 3) From the sequence generated in 1), the free energy calculated in 2) is included in the three ranges of -28 ± 0.5 kcal / mol, -32 ± 0.5 kcal / mol, -36 ± 0.5 kcal / mol. Select 50 items each, 150 in total. If there are less than 50 selected arrays in one range, go back to 1) and add the arrays.

４）３）で得られた150本の配列をもとに、文字を左端から１文字削除したもの、２文字削除したもの・・・のように、１１文字目まで削除した長さ１４の配列まで、もとの配列と合わせて合計1800本の配列を生成する。 4) Based on the array of 150 obtained in 3), an array with a length of 14 that has been deleted up to the 11th character, such as a character deleted from the left end, a character deleted from the left, etc. Up to 1800 sequences are generated together with the original sequence.

５）４）で作成した1800本の配列それぞれに対して、3’末端から5’までの各塩基に一塩基ずつ置換した、ミスマッチ配列を合成し、計110700本のプローブとする。 5) For each of the 1800 sequences prepared in 4), a mismatch sequence is synthesized by substituting one base at each base from the 3 'end to 5' to obtain a total of 110700 probes.

また、この実験では３）で生成した150本の各配列に相補的な25merのオリゴヌクレオチドを合成し、コントロールプローブ１５として利用できるようにした。 In this experiment, a 25-mer oligonucleotide complementary to each of the 150 sequences generated in 3) was synthesized and used as the control probe 15.

資料の調整、ハイブリダイゼーション、洗浄、スキャンの各過程はAffymetrix社のExpression Analysis Technical Manualに従った。ハイブリダイゼーションにはAffymetrix Hybridization Oven 640を用い、45Cで16時間、60rpmの条件で行った。洗浄にはAffymetrix Fluidics Station 450を用い、ProkGE_WS2_450スクリプトを用いた。マイクロアレイのスキャンにはAffymetrix GeneChip Scanner 3000 を用いて計測し、Microarray Suite 5.0 によってシグナル強度の値を求めた。 The document adjustment, hybridization, washing, and scanning processes followed Affymetrix's Expression Analysis Technical Manual. Hybridization was performed using Affymetrix Hybridization Oven 640 at 45C for 16 hours at 60 rpm. Affymetrix Fluidics Station 450 was used for cleaning, and ProkGE_WS2_450 script was used. The microarray scan was measured using an Affymetrix GeneChip Scanner 3000, and the signal intensity value was obtained using Microarray Suite 5.0.

データの解析には統計計算言語Ｒ（R. Ihaka and R. Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:299314, 1996 参照）を用い、オリジナルのスクリプトを作成して計算を行った。解析では特に断らない限り、各実験の蛍光強度は対数(log10)で扱うものとする。 For the analysis of data, use the statistical calculation language R (see R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5: 299314, 1996) and create an original script. And calculated. Unless otherwise specified in the analysis, the fluorescence intensity of each experiment is handled in logarithm (log 10).

４．３実験
実験１：大腸菌の総ゲノムＤＮＡを抽出、断片化したものを計測した。
実験２：大腸菌の総ゲノムＤＮＡに、コントロール用サンプルとしてオリゴヌクレオチドを加えたサンプルを計測した。オリゴヌクレオチドの最終濃度が1.4fMから1.4nMまで10倍ずつ7段階に異なるサンプルを用意し、それぞれで２回ずつ、合計で１４回の計測を行った。 4.3 Experimental Experiment 1: Extracted and fragmented total genomic DNA of E. coli was measured.
Experiment 2: A sample obtained by adding an oligonucleotide as a control sample to total genomic DNA of E. coli was measured. Different samples were prepared in 7 stages from the final oligonucleotide concentration of 1.4 fM to 1.4 nM in 10 stages, and measurement was performed 14 times in total, twice each.

４．４パラメータの決定
まず、ゲノムのみを計測した実験１により、クロスハイブリダイゼーションに関するパラメータを決定した。オリゴヌクレオチドは加えられていないため、ランダムプローブ１３のシグナルは全てクロスハイブリダイゼーションによるものと見なすことができる。測定に使用したデータ点数は110,700である。すなわち、110,700個のランダムプローブ１３を用いた。 4.4 Determination of parameters First, parameters related to cross-hybridization were determined by Experiment 1 in which only the genome was measured. Since no oligonucleotide is added, all signals of the random probe 13 can be regarded as due to cross-hybridization. The number of data points used for the measurement is 110,700. That is, 110,700 random probes 13 were used.

Ｒの最適化関数を用いたフィッティングにより、バックグラウンドターゲット量ｔ^ns、パラメータｗ^pf及び48個のε^nsの値を決定した。最適化されたモデルによる予測値と実測値との散布図を図６に示す。このときの予測値の誤差Ｒ^nsは０．０４４であり、高い精度でクロスハイブリダイゼーションの量を予想できていると言える。 A background target amount t ^ns , a parameter w ^pf, and 48 values of ε ^ns were determined by fitting using an R optimization function. FIG. 6 shows a scatter diagram of predicted values and measured values based on the optimized model. The error R ^ns of the predicted value at this time is 0.044, and it can be said that the amount of cross hybridization can be predicted with high accuracy.

４．５フィッティングの収束
パラメータ決定時のフィッティングに用いるデータ点数を変化させ、定量的なモデルを構築するために最低限必要なランダムプローブ１３の本数を見積もった。ランダムプローブ１３の中から乱数によって、50、100、200、400、800、1600、3200本のプローブをサンプリングする。それぞれのサンプルデータをもとにモデルのパラメータを決定し、そのパラメータを用いて残りの全ランダムプローブの蛍光強度を予測した場合の予測誤差を求める。図７に、予測誤差を、使用した本数に対してプロットした結果を示す。サンプルの取り方を変えた２回の検証の結果を重ねて表示してある。サンプルデータが少なすぎる場合はオーバーフィッティングにより予測誤差が大きくなっているが、十分に多くのサンプルがあれば誤差は収束していることが分かる。実験のばらつき等も考慮に入れると、定量的な解析のためには、およそ1000本以上のプローブを使用することが好ましい。 4.5 The number of data points used for fitting at the time of determining the fitting convergence parameter was changed, and the minimum number of random probes 13 required to construct a quantitative model was estimated. 50, 100, 200, 400, 800, 1600, and 3200 probes are sampled from random probes 13 by random numbers. A parameter of the model is determined based on each sample data, and a prediction error when the fluorescence intensity of all remaining random probes is predicted using the parameter is obtained. FIG. 7 shows the result of plotting the prediction error against the number used. The results of the two verifications that changed the way the samples were taken are displayed in an overlapping manner. When the sample data is too small, the prediction error increases due to overfitting, but it can be seen that the error converges if there are a sufficient number of samples. Taking into account variations in experiments, etc., it is preferable to use approximately 1000 or more probes for quantitative analysis.

４．６コントロールプローブによる検量
次に実験２のデータを用いて、コントロールプローブ１５のシグナル値から本発明の方法を用いて元の濃度を計算し、サンプルに加えたオリゴヌクレオチドの実際の濃度との比較を行う。前節で求めたパラメータをもとに、オリゴヌクレオチドの濃度の異なる７組の実験それぞれに対して独立に濃度の値を求めた。計算には長さ25merのプローブ150本を用いた。実際の遺伝子発現の解析では同一の遺伝子の異なる箇所に複数のプローブがマッチするように設計されていることが通常であるので、便宜上150本のプローブを、５本ずつ３０組のプローブセットとして扱う。 4.6 Calibration by Control Probe Next, using the data of Experiment 2, the original concentration is calculated from the signal value of the control probe 15 using the method of the present invention, and the actual concentration of oligonucleotide added to the sample is calculated. Make a comparison. Based on the parameters determined in the previous section, concentration values were independently determined for each of seven experiments with different oligonucleotide concentrations. For the calculation, 150 probes with a length of 25mer were used. In actual gene expression analysis, since it is usually designed so that a plurality of probes match different parts of the same gene, for convenience, 150 probes are treated as 30 probe sets of 5 each. .

各セットに含まれる５本のプローブは同じ量のターゲットに対応すると仮定し、その共通のターゲットの濃度ｔ^t _iを求める。必要なパラメータは既に求められているので、式（２６）で示した誤差が最小となるようにｔ^t _iを最適化して求める。それぞれのセットに対し７段階の投入量に対して計算されたモデルからの推定濃度を独立に計算した。その推定量と、実際の投入量との関係を比較した結果が図８に示す。 Five probes in each set is assumed to correspond to the same amount of the target, determine the concentration t ^t _i of the common target. Since the necessary parameters have already been obtained, t ^t _i is obtained by optimizing so that the error shown in equation (26) is minimized. Estimated concentrations from the model calculated for 7 inputs for each set were calculated independently. The result of comparing the relationship between the estimated amount and the actual input amount is shown in FIG.

図８（ａ）に、既存の手法としてAffymetrix社のアルゴリズムをもとにした解析結果を示す。計算には上記の１５０本のプローブ、及び対応するミスマッチプローブのデータを用い、プローブセットとしては上記の５本ずつのセットを用いている。既存の手法ではデータのばらつきが大きく、低濃度側では特に分散が広くなっている。またAffymetrix 社のアルゴリズムではPMプローブよりMMプローブのシグナルが大きいデータは除かれるため、クロスハイブリダイゼーションの影響が大きいと、プローブセットのデータの信頼性が落ちる。 FIG. 8A shows an analysis result based on the Affymetrix algorithm as an existing method. For the calculation, the above 150 probes and the corresponding mismatch probe data are used, and the above 5 sets are used as the probe set. In the existing method, the variation in data is large, and the dispersion is particularly wide on the low concentration side. In addition, the Affymetrix algorithm excludes data with a larger MM probe signal than the PM probe. Therefore, if the influence of cross-hybridization is large, the reliability of the probe set data decreases.

特に最も濃度が低いところでは、除かれる点がプローブの半数近くにのぼっている。一方、図８（ｂ）に示すように、本発明では低濃度側から高濃度側まで、プロットした点がほぼ直線上に位置すると言え、投入量と推定量の間に高い一致をみせている（相関係数0.989）。また、プローブセット間での分散も少ないので信頼性が高いと言える。より少ないプローブの本数でより精度の高い測定が可能なため、この手法を用いて発現解析の解像度を上げることが可能になる。 In particular, at the lowest concentration, nearly half of the probes are excluded. On the other hand, as shown in FIG. 8 (b), in the present invention, it can be said that the plotted points are located almost on a straight line from the low concentration side to the high concentration side, and shows high agreement between the input amount and the estimated amount. (Correlation coefficient 0.989). Further, it can be said that the reliability is high because there is little dispersion between the probe sets. Since more accurate measurement is possible with a smaller number of probes, it is possible to increase the resolution of expression analysis using this method.

５．まとめ
本発明によれば、解析対象のターゲットと非特異的に結合するランダムプローブの蛍光強度からクロスハイブリダイゼーションを生じるターゲット（バックグラウンドターゲット）の量を算出し、算出したバックグラウンドターゲット量と、各プローブの結合エネルギーとに基づいて、バックグラウンドターゲットによる蛍光強度に対する影響を評価し、この影響を排除した形で目的のターゲットの濃度を求める。このようにバックグラウンドターゲットによる影響を排除できることから、真のターゲット濃度を精度よく求めることができる。特に、従来クロスハイブリダイゼーションによるノイズに埋もれて測定不能であった低い濃度のターゲットに対して、適切に評価することが可能になる。 5. Summary According to the present invention, the amount of a target (background target) that causes cross-hybridization is calculated from the fluorescence intensity of a random probe that binds nonspecifically to the target to be analyzed, Based on the binding energy of the probe, the influence of the background target on the fluorescence intensity is evaluated, and the concentration of the target target is obtained in a form that eliminates this influence. Since the influence of the background target can be eliminated in this way, the true target concentration can be obtained with high accuracy. In particular, it is possible to appropriately evaluate a low concentration target that has been buried in noise due to cross hybridization and cannot be measured.

なお、上記モデルにおいて、プローブ及びターゲットの取り得る状態として他の状態を考慮してもよい。例えば、プローブ同士が結合した状態や、解析対象のターゲットとバックグラウンドターゲットとが結合した状態を考慮してもよい。 In the above model, other states may be considered as possible states of the probe and the target. For example, a state in which probes are combined or a state in which a target to be analyzed and a background target are combined may be considered.

また、上記モデルにおいて、特異的、及び、非特異的結合エネルギーを求めるために48個のパラメータを用いたNearest Neighborモデルを用いたが、このモデルには他のパラメータを考慮する、あるいは一部のパラメータを無視するなどしてもよい。たとえば、塩基の位置に依存した効果を加える、反転させた配列を同じものと見なさない、４塩基以上の組み合わせを考慮するなどの方法が考えられる。 In the above model, the Nearest Neighbor model using 48 parameters was used to obtain specific and nonspecific binding energies. The parameter may be ignored. For example, a method of adding an effect depending on the position of the base, or considering a combination of four or more bases in which the inverted sequences are not regarded as the same can be considered.

また、上記モデルにおいて、折りたたみの自由エネルギーを求めるためにMFOLDによる計算値を用いたが、この計算には他の方法を用いても良い。 In the above model, the calculated value by MFOLD is used to obtain the free energy of folding, but other methods may be used for this calculation.

本実施形態では、ランダムプローブの設計に乱数を用いて人工的に作成した配列を用いたが、このランダムプローブには計測の対象となる細胞のゲノムに含まれなければ、異なる生物のゲノムに含まれる配列を用いてもよい。 In this embodiment, an artificially generated sequence using random numbers was used for the design of a random probe, but this random probe is included in the genome of a different organism if it is not included in the genome of the cell to be measured. May be used.

本実施形態では、細胞のゲノムをサンプルとしてハイブリダイゼーションさせたが、細胞から抽出したRNAを逆転写し、クローンDNAとしたもの、あるいはそこからさらにRNAを転写させることで増幅したものをサンプルとしてハイブリダイゼーションさせて測定することにより、もともとのRNAの細胞内の濃度、すなわち各遺伝子の発現量を解析することが可能である。 In this embodiment, hybridization was performed using the cell genome as a sample, but hybridization was performed using a sample obtained by reverse transcription of RNA extracted from the cell to obtain cloned DNA, or amplification by further RNA transcription therefrom. Thus, it is possible to analyze the intracellular concentration of the original RNA, that is, the expression level of each gene.

また、本発明は、GeneChip型アレイに限らず、オリゴヌクレオチドプローブと、ターゲットのRNA/DNAとの間のハイブリダイゼーションに基づいた計測システム一般に応用可能である。 The present invention is not limited to GeneChip type arrays, and can be applied to general measurement systems based on hybridization between oligonucleotide probes and target RNA / DNA.

本発明はマイクロアレイを用いた遺伝子発現の解析精度を向上でき、測定データの信頼性を上げることが可能である。
その効果として、
１）より少ないプローブ数のセットを用いて信頼できるデータを得られる。
２）低い濃度のターゲットもより正確に測定できるため、発現量の低い遺伝子の定量的な解析に有用である。
などのメリットがある。 The present invention can improve the analysis accuracy of gene expression using a microarray, and can increase the reliability of measurement data.
As an effect,
1) Reliable data can be obtained using a set of fewer probes.
2) Since a target with a low concentration can be measured more accurately, it is useful for quantitative analysis of a gene with a low expression level.
There are merits such as.

プローブセットに必要なプローブ数を少なくすることは、ひとつのマイクロアレイで解析できる遺伝子の種類を増やせるため、解析効率の向上が期待できる。また、少数のプローブから定量的に発現量を測定可能であることから、siRNA,miRNAなどの短いノンコーディングRNAの働きを解明するためにも有効であると言え、RNA干渉を利用した遺伝子操作、遺伝子治療への応用が考えられる。また、発現量の低い遺伝子の振る舞いを詳しく調べることが可能になるということは遺伝子制御のネットワークにおける転写因子の働きを解明するために大きな意味を持つ。たとえば、製薬産業ではゲノム創薬に多大な注目が寄せられており、中でも毒物または薬剤の作用機序を遺伝子発現レベルで網羅的に解析するトキシコゲノミクスの発展は非常に重要であるといえる。転写因子の発現解析およびそれに付随するゲノムネットワーク解析のためのツールとしてマイクロアレイによる解析が有望視されている。しかし、現状では転写因子など発現量の低い遺伝子群の解析を正確に行うことが困難であるため、データの質の低さにより様々な制約を受けている。本発明の解析手法を用いることにより詳細な発現解析がなされれば、トキシコゲノミクスの発展およびそれに付随するゲノム創薬産業の隆盛に大いに貢献できると考えられる。 By reducing the number of probes required for the probe set, the number of types of genes that can be analyzed with one microarray can be increased, so that improvement in analysis efficiency can be expected. In addition, since the expression level can be measured quantitatively from a small number of probes, it can be said that it is also effective for elucidating the action of short non-coding RNAs such as siRNA and miRNA. Application to gene therapy can be considered. In addition, being able to examine the behavior of genes with low expression levels in detail has great implications for elucidating the role of transcription factors in gene regulatory networks. For example, in the pharmaceutical industry, much attention has been paid to genomic drug discovery. In particular, it can be said that the development of toxicogenomics that comprehensively analyzes the mechanism of action of toxins or drugs at the gene expression level is very important. Microarray analysis is promising as a tool for analysis of transcription factor expression and accompanying genome network analysis. However, at present, it is difficult to accurately analyze a gene group having a low expression level such as a transcription factor, and therefore, there are various restrictions due to the low quality of data. If a detailed expression analysis is performed by using the analysis method of the present invention, it is considered that it can greatly contribute to the development of toxicogenomics and the accompanying prosperity of the genome drug discovery industry.

本発明のマイクロアレイ解析方法で使用されるマイクロアレイのプローブ構成を示す図The figure which shows the probe structure of the microarray used with the microarray analysis method of this invention サンプル溶液中のターゲットとプローブの取り得る状態を説明した図Diagram explaining possible states of target and probe in sample solution 本発明のマイクロアレイ解析システムの構成例を示す図The figure which shows the structural example of the microarray analysis system of this invention 本発明のマイクロアレイ解析システムによるパラメータ決定処理のフローチャートFlowchart of parameter determination processing by microarray analysis system of the present invention 本発明のマイクロアレイ解析システムによる濃度解析処理のフローチャートFlow chart of concentration analysis processing by the microarray analysis system of the present invention バックグラウンドターゲットによる蛍光強度の予測値と実測値の比較結果を示す図（横軸に予測値、縦軸に実測値、両軸とも単位は任意単位（AU）、スケールは対数(log10)、太い破線がy=x、細い破線がy=2x, x/2を示す。）Figure showing the comparison result of the predicted value of fluorescence intensity by the background target and the measured value (the predicted value on the horizontal axis, the measured value on the vertical axis, the unit on both axes is arbitrary unit (AU), the scale is logarithm (log10), thick (The broken line shows y = x, and the thin broken line shows y = 2x, x / 2.) モデルの最適化に用いたデータの数と、モデルの予測誤差との関係を示す図Diagram showing the relationship between the number of data used for model optimization and model prediction errors 投入したコントロールの濃度と、シグナル値からモデルによって換算された濃度との比較結果を示す図（横軸に投入濃度、縦軸に推定量、両軸とも単位はfM、スケールは対数(log10)、太い破線はy=xを示す。（ａ）既存の手法による推定値を示した図、（ｂ）本発明による推定値を示した図）A graph showing the comparison between the concentration of the input control and the concentration converted from the signal value by the model (the input concentration on the horizontal axis, the estimated amount on the vertical axis, the unit on both axes is fM, the scale is logarithm (log10), A thick broken line indicates y = x, (a) a figure showing an estimated value by an existing method, (b) a figure showing an estimated value according to the present invention).

Explanation of symbols

１０マイクロアレイ
１１計測プローブ
１３ランダムプローブ
１５ターゲットプローブ
２０スキャナ
５０解析装置
５１表示部
５３データ格納部
５５プロセッサ
５７操作部
５８スキャナインタフェース
５９外部インタフェース
DESCRIPTION OF SYMBOLS 10 Microarray 11 Measurement probe 13 Random probe 15 Target probe 20 Scanner 50 Analyzer 51 Display part 53 Data storage part 55 Processor 57 Operation part 58 Scanner interface 59 External interface

Claims

A microarray analysis method for obtaining a concentration of a target contained in a sample solution using a microarray in which a plurality of probes are arranged,
a) The microarray includes a plurality of measurement probes having a base sequence corresponding to the base sequence of the target to be analyzed, and a plurality of random probes having a base sequence not corresponding to the base sequence of the target to be analyzed,
b) The microarray analysis method is as follows:
Hybridizing a solution containing the target and each probe of the microarray;
Measuring the fluorescence intensity of each probe;
Determining the amount (t ^ns ) of a background target that is a target non-specifically bound to the probe from the measurement value of the fluorescence intensity of the random probe;
Predicting the degree of influence of cross hybridization (NS _i ) on the measured fluorescence intensity of the measurement probe based on the amount of the background target and the binding energy of each probe;
And excluding the influence of the cross hybridization from the measured value (I ^m _i ) of the fluorescence intensity of the measurement probe to obtain the target amount (t ^t _i ) to be analyzed. Microarray analysis method.

Define a model that shows the equilibrium between possible states of the probe, and use the model to determine the amount of the background target (t ^ns ),
The state of the probe in the model includes a state in which the probe is free in a solution, a state in which the probe and the background target are nonspecifically bound, and a probe that is folded by self-hybridization to have a secondary structure. State included,
The microarray analysis method according to claim 1, wherein:

Hybridizing only a sample fragmented from the genome to be analyzed with probes on the microarray;
Measuring the fluorescence intensity of the random probe by the hybridization;
The method for analyzing a microarray according to claim 2, further comprising a step of determining parameters (w ^pf , ε ^ns ) for obtaining an equilibrium constant of the model based on the measured value.

Define a model that shows the equilibrium between the possible states of the probe and target,
The state of the probe includes a state where the probe is free in a solution, a state where the probe is folded by self-hybridization and has a secondary structure, a state where the target is free in the solution, and a state where the target is self-hybridized. Includes a folded and secondary structure, a probe and background target non-specifically bound, and a probe and background target specifically bound,
2. The microarray analysis method according to claim 1, wherein the binding energy is a parameter for obtaining an equilibrium constant of the model.

Hybridizing a mixture of a sample obtained by fragmenting the genome to be analyzed and a DNA having a determined concentration with a probe on the microarray;
Measuring the fluorescence intensity of the measurement probe by the hybridization;
5. The method of analyzing a microarray according to claim 4, further comprising a step of obtaining a parameter (w ^tf , ε ^s ) for obtaining the binding energy based on the measured value.

Hybridizing a part of the plurality of random probes with a DNA having a base sequence corresponding to the base sequence of the part of the probe and having a determined concentration;
Measuring the fluorescence intensity of the partial probe;
The method further comprises the step of determining a proportionality constant between the fluorescence intensity of the probe and the target concentration based on the measured value of the fluorescence intensity of the partial probe and the determined value of the concentration of the DNA. Item 4. A microarray analysis method according to Item 1.

A microarray analyzer that analyzes a signal from a microarray in which a plurality of probes are arranged to determine a target concentration in a sample solution,
a) The microarray includes a plurality of measurement probes having a base sequence corresponding to the base sequence of the target to be analyzed, and a plurality of random probes having a base sequence not corresponding to the base sequence of the target to be analyzed,
b) The analysis device
Data storage means for storing information about the model used for analysis;
An input means for inputting a measured value of the fluorescence intensity of each probe obtained as a result of hybridizing the solution containing the target and each probe of the microarray;
Processing means for calculating a target concentration (t ^t _i ) from the measured value of the input fluorescence intensity using information on the model,
c) The processing means includes:
From the measurement value of the fluorescence intensity of the input random probe, the amount (t ^ns ) of the background target that is a target non-specifically bound to the probe is determined.
Based on the amount of the background target and the binding energy of each probe, the degree of influence of cross-hybridization (NS _i ) on the measured fluorescence intensity of the measurement probe is predicted,
A microarray analyzer characterized in that the amount of target (t ^t _i ) to be analyzed is obtained by excluding the influence of the cross hybridization from the measured value (I ^m _i ) of the fluorescence intensity of the measurement probe. .

The data storage means stores information about a model indicating an equilibrium between possible states of the probe,
The state of the probe in the model includes a state in which the probe is free in a solution, a state in which the probe and the background target are nonspecifically bound, and a probe that is folded by self-hybridization to have a secondary structure. State included,
The processing means obtains the amount (t ^ns ) of the background target using the model.
8. The microarray analysis apparatus according to claim 7, wherein

The processing means is a parameter for obtaining an equilibrium constant of the model based on a measured value of fluorescence intensity of a random probe when only a sample obtained by fragmenting the genome to be analyzed is hybridized with a probe on the microarray ( ^9. The microarray analysis apparatus according to claim 8, further comprising determining w ^pf , ε ^ns ).

The data storage means stores information relating to a model indicating an equilibrium between possible states of the probe and the target;
The state of the probe includes a state where the probe is free in a solution, a state where the probe is folded by self-hybridization and has a secondary structure, a state where the target is free in the solution, and a state where the target is self-hybridized. Includes a folded and secondary structure, a probe and background target non-specifically bound, and a probe and background target specifically bound,
8. The microarray analysis apparatus according to claim 7, wherein the binding energy is a parameter for obtaining an equilibrium constant of the model.

The processing means is based on the measurement value of the fluorescence intensity of the measurement probe when a mixed solution of the sample fragmented from the genome to be analyzed and the DNA whose concentration has been determined is hybridized with the probe on the microarray. ^11. The microarray analysis apparatus according to claim 10, wherein parameters (w ^tf , ε ^s ) for ^obtaining the binding energy are obtained.

The processing means is configured to hybridize a partial probe of the plurality of random probes and a DNA having a base sequence corresponding to the base sequence of the partial probe and having a determined concentration. 8. The microarray analyzer according to claim 7, wherein a proportionality constant between the probe fluorescence intensity and the target concentration is determined based on a measured value of the fluorescence intensity of the probe and a determined value of the concentration of the DNA. .

An analysis program for analyzing a signal from a microarray in which a plurality of probes are arranged, and causing a computer to execute a process for obtaining a concentration of a target in a sample solution,
a) The microarray includes a plurality of measurement probes having a base sequence corresponding to the base sequence of the target to be analyzed, and a plurality of random probes having a base sequence not corresponding to the base sequence of the target to be analyzed,
b) The analysis program is
Inputting a measurement value of the fluorescence intensity of each probe obtained as a result of hybridization of the solution containing the target and each probe of the microarray;
Determining the amount (t ^ns ) of a background target that is a target that non-specifically binds to the probe from the measured fluorescence intensity of the input random probe;
Predicting the degree of influence of cross hybridization (NS _i ) on the measured fluorescence intensity of the measurement probe based on the amount of the background target and the binding energy of each probe;
Causing the computer to execute the step of determining the target amount (t ^t _i ) to be analyzed by excluding the influence of the cross hybridization from the measured value (I ^m _i ) of the fluorescence intensity of the measurement probe. A microarray analysis program.