A kind of systematic steady state detection algorithm based on hierarchical clustering
Technical field
The present invention relates to a kind of system detecting methods, examine more particularly, to a kind of systematic steady state based on hierarchical clustering
Method of determining and calculating.
Background technique
In the research to a procedures system, stable state is the most important and most common hypothesis.Whether system is in stable state,
It is directly related to the subsequent method to the modeling of system, control and optimization.Principle and structure is complicated inside procedures system, shows as
Physical quantity is there is stronger coupling, and there is extremely strong non-linear for system.When system is in unsteady state, system respectively becomes
The data characteristic of amount changes acutely, numerically shows as unstable or abnormal input/output relation presence with real system
Relatively large deviation.Only system is under steady working condition, and parameters and variable just have stronger state consistency.Based on such
Situation, the evaluation to equipment runnability, the analysis of plant characteristic and controller's effect require steady to obtain the operation of system
Premised on state.
With the development of process industrial, system and production method all tend to complicate in actual production process, are related to
Procedures system object be often multivariable, high dimension and close coupling, system overall performance is non-linear, time variation, not really
Qualitative and imperfection.Although complicated system has caused great difficulties mechanism study and modeling, simultaneously because DCS is controlled
The application of system processed and intelligence instrument, so that more and more process datas are recorded.It is each in actual procedures system
Between a variable existing strong and complicated coupled relation to study by measurement data the stability of whole system at
In order to may, with data mining and the continuous development of Statistical Learning Theory and perfect, process industrial field is also gradually being used
Related algorithm solving practical problems produce such as statistical Process Control field.On the problem of handling stable state detection, big data
Thought and method it is also different from traditional Study of Control Process method, the latter judge system whether be in stablize often by
Several key variables carry out decision to formulate stable state standard, and the former then requires to divide system from whole data
Analysis, the result theoretically obtained based on total data more comprehensively and will be truly reflected the actual conditions of system, because
This available more acurrate reliable result.
For stable state test problems since last century, the eighties was suggested, many scholars at home and abroad are proposed different stable states
The method of detection, but due to the complexity of field data, it is much existing that the stable state of validity is demonstrated in l-G simulation test
The testing result that detection means obtains in practical applications is not exactly accurate.And various methods are by itself affect and reality
Specific requirements limitation in, the occasion of application are also not quite similar.
Cluster is technology critically important in unsupervised learning field, for the individual in data sample to be divided into different classes
Not, so that individual has similitude as high as possible in class, individual has otherness as high as possible between class.Hierarchical clustering thought
It is to be proposed by Johnson S C. in 1967 earliest, it is different not needed with other clusters, hierarchical clusterings such as EM and K-means
Classification number is given in advance, and does not need iterative optimization procedure, can be obtained by given similarity function and threshold value poly-
Class result.It, can be to avoid " dimension disaster ", but since hierarchical clustering needs to calculate due to the problems such as not being related to Optimization Solution
The distance between sample two-by-two, complexity can be with the increase of sample size and square multiple increases.
The flow chart of Traditional calculating methods structure such as Fig. 1, it is assumed that have N number of sample in sample data sets to be clustered
Body, the algorithm of hierarchical clustering specifically have following step:
1) similarity function and algorithm termination condition threshold value (generally maximum inter- object distance or infima species spacing are defined
From);
2) each individual in data set is gathered for one kind, total N class;
3) current data set is allocated as calculating the similarity between every class for k class, and d (i, j) is indicated between i and j class
Similarity merges i-th and jth class if d (i, j) is minimum value in current similarity set between any two.Number is clustered at this time
K-1 is become by k;
4) termination condition value at this time is calculated, terminates algorithm if meeting threshold value;
5) it returns to 3), as k=1, all sampled points are all gathered for same class, and clustering algorithm can not continue, then algorithm
Stop.
Conventional method major defect is strong to the dependence of process object as a result, and universality is poor, and mainly for monotropic
Amount carries out stable state detection, therefore applicability is not strong, and not can avoid the meter generated in processing data because of data dimension
The problem of calculation amount increases severely.
Summary of the invention
In order to solve the problems, such as background technique, the stable state based on hierarchical clustering that the invention proposes a kind of, which detects, to be calculated
Method.
The technical solution adopted by the present invention is that:
STEP 1 generates clustering tree:
1.1) for including the industrial data { d of N number of sampled point in one section of continuous time sectioni, i=1,2,3 ..., N, with
Sampled point is as class, d in setiIndicate the industrial data of class;
1.2) all the distance between classes two-by-two are calculated, the matrix A of N × N is obtained, the element that i row j is arranged in matrix A is denoted as
aij, aii=0, aijIndicate class diWith class djThe distance between, obtained matrix A is as follows:
1.3) the lowest distance value a in matrix A is foundmn, amnFor its class spacing.amnIt indicates m class and the n-th class is distance
M class and the n-th class are merged into a new class by nearest class, m row, n row, m column and n column in puncture table A, will be in matrix A
Remaining class uses the identical mode of step 1.2) to carry out processing again with the new class merged and obtains updated matrix A;
1.4) step 1.2)~1.3 are repeated) it is iterated, until matrix A becomes 1 × 1 matrix, record each iteration meter
M, n and a during calculationmn, constitute the expression matrix Z of N × 3 of hierarchical clustering tree;
2 threshold value of STEP is chosen:
2.1) it for expression matrix Z, successively calculates and adopts from last time iterative numerical to the sequence of first time iterative numerical
It is calculated with the following methods:
With current lowest distance value amnIt is threshold value to industrial data { diCarry out cluster calculation, obtained cluster result sequence
It is classified as Tk, k expression iteration ordinal number, cluster result sequence TkFor the integer sequence of 1 × N, wherein Tk(i) cluster result sequence is indicated
TkI-th cluster result, Tk(i)=p;Cluster result sequence T is sought in calculatingkDifference sequence D, be to every in sequence
A adjacent element subtracts each other acquisition difference, calculates in difference sequence D zero number as cluster result reasonability value D_zero (k).
2.2) cluster result reasonability value sequence is made of the cluster result reasonability value D_zero (k) being calculated every time
D_zero calculates the difference sequence for seeking cluster result reasonability value sequence D_zero, is to the adjacent member of each of sequence
Element subtracts each other acquisition difference, finds maximum difference value and its serial number k at place in difference sequence, and therefrom with z3kAs final
Threshold value.
The step STEP 2 is successively calculated from last time iterative numerical to the sequence of first time iterative numerical refers to meter
It calculates until the iterative numerical of the centre of the first time iterative numerical of k=1 or k=N/2 time.
When k is too small, the result of cluster is nonsensical to final stable state identification, generally selects to reduce calculation amount
Select stop condition of the k=N/2 as algorithm.
STEP 3 combines timing and judges stable state:
Element T in final cluster result sequence TiIf the following conditions are met, then it is assumed that m-th of sampled point to m+k-1
System between a sampled point is in stable state:
Ti=c, i=m, m+1, m+2 ..., m+k-1
Wherein, TiFor the corresponding cluster result of the i-th sampled point in final cluster result sequence T, c indicates result constant, k=
τ/Ts, τ is above-mentioned time span threshold value, TsFor the sampling time interval of data.
The size of τ is determined according to the time response of institute's research object, generally takes τ=3t*, t*For the unit step of system
Response regulation time.
Merge obtained new class in the step 1.3) and be placed in the end in updated matrix A, end is added new one
Row/column can be denoted as [a1(N+1) a2(N+1)…]T。
The ranks digital number of remaining class remains unchanged in the matrix A, the ranks serial number number of the new class merged
Word is all different using new ranks digital number and with the ranks digital number of all classes before, so that whole system process is every
One kind has unique ranks digital number.
Merge the distance between remaining each class in obtained class and matrix A in the step 1.3) and uses following formula
It calculates:
asi=α ami+(1-α)an*
Wherein, α is weight parameter, 0≤α≤1, αsiIndicate the spacing between s class and i class, aniIt indicates between n class and i class
Spacing, amiIndicate the spacing between m class and i class.
" feature vector " (or characteristic point) that each sampled point of system is regarded as to a state space, when system is in
When stable state, characteristic point fluctuates in certain section, the higher-dimension Gaussian Profile being rendered as centered on a certain specified point.At system
When transition state or unstable state, state point will be disengaged from original Gaussian Profile, and then show the distribution shape in addition dispersed
State.When being in unstable state based on system between the moment of front and back state this widely different feature, can be by comparing system shape
Difference degree between state feature vector carries out the detection of procedures system stable state.Here clustering algorithm is introduced, is examined in stable state
In survey, there is very high similitude between steady state data, can be gathered in cluster for one kind, while dynamic data point and stable state number
There are great differences between, then can be assigned to different classifications.
Due to the characteristic of system, it can only guarantee that concentration is compared in state distribution when system is in stable state, but if be
When system is in fluctuation status, its fluctuation being distributed is very big in state space.That is, being deposited between stable state sampled point
While similitude, the similitude between dynamic sampling point is very little, the dynamic point in cluster result, in continuous time
It may be gathered in many different classes.Therefore, be before cluster can not determine finally there are class number, be based on this
Hierarchical clustering algorithm of the present invention is used a bit.
The beneficial effects of the present invention are:
The present invention by " big data " thought and technology be introduced into stable state detection in, by comparing sampled point each in data set it
Between similarity degree, carry out stable state detection in combination with the temporal characteristics of data, and propose and how to determine cluster threshold value
Method.
It is characteristic of the invention that strong applicability, and avoid the calculation amount generated in processing data because of data dimension
Sharp increase phenomenon.
Detailed description of the invention
Fig. 1 is the flow chart of hierarchical clustering algorithm.
Fig. 2 is the flow chart of the method for the present invention threshold value selection course.
Fig. 3 is the flow chart that the method for the present invention combination timing judges stable state.
Fig. 4 is the input-output curve figure of 1 second order analogue system of embodiment.
Fig. 5 is 1 second-order system state point cluster result figure of embodiment.
Fig. 6 is that embodiment 1 clusters testing result figure.
Fig. 7 is the input-output curve figure of 2 second order analogue system of embodiment.
Fig. 8 is that embodiment 2 takes the cluster result comparison diagram obtained when different threshold values.
Fig. 9 is the pending data curve graph that embodiment 3 inputs.
Figure 10 is the cluster result in the 50th iteration of embodiment 3.
Figure 11 is 0 number and its change profile figure in D in 3 all 50 iteration of embodiment.
Figure 12 is the representative part cluster result of embodiment 3.
Figure 13 is the stable state testing result figure of embodiment 3.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.
The method of the present invention is applied particularly in the detection of procedures system stable state, as shown in Fig. 2, there is following process: wait divide
Data point { the d of classi, the metric form of the distance and between class distance put in its state space is defined first (due to process system
The state point of system, according to certain probability and aggregation extent random distribution, detects in its corresponding state space in stable state
We select to measure using Euclidean distance in method).When measuring between class distance, we tie stable state of concern in cluster
It is a class in fruit, we will improve the aggregation extent of every one kind as far as possible, and maximum distance is selected more to meet above-mentioned requirements.
The input of algorithm: according to the system history data state point { d of time-sequencingi, wherein { diIt is the vector number that p is tieed up
According to i=1,2,3 ..., N, N are number of sampling points.
The output of algorithm: the corresponding classification T of each sampled point, T is the one-dimensional sequence that length is N, i.e., sampled point classification is compiled
Number sequence.Here class number only plays the role of distinguishing classification, meaning without any physics and numerically.
The step of according to hierarchical clustering, following algorithm are iterated by calculating the distance between data centrostigma, often
One step, which checks whether, reaches algorithm termination condition, and algorithm terminates, and exports cluster result T.
Since the data object of analysis is the time series according to Time alignment, when system is in steady whithin a period of time
State, spatial distribution also Relatively centralized.Therefore the timing for utilizing data, meets following two condition, that is, is regarded as stable state:
1) data sampling point is continuous in time;2) very high (i.e. data are gathered one kind to the similarity of data sampling point by clustering algorithm
In).
Stable state is obtained by cluster result, it is thus necessary to determine that the threshold value of cluster, and judge stable minimum time section T,
At selected threshold value δ, when the state point of system in a period of time of length not less than T is gathered in same class, i.e., explanation should
System is in stable state in the section time.The determination of minimum time section T can mainly beg for below according to the time response of system object
Influence by the selection and threshold value of cluster threshold value δ to algorithm.
During cluster calculation, there is N number of sample in data acquisition system to be clustered, data are gradually aggregated to 1 from N class
In the cluster tree matrix Z of the process record N*3 of class, wherein Zi1Merge the first kind serial number in two classes when recording the i-th step cluster,
Zi2Merge the second class serial number in two classes, Z when recording the i-th step clusteri3Merge the distance between two classes when recording the i-th step cluster.Cause
It is that a line is found in Z that this, which obtains optimum cluster threshold value, as the end line of algorithm output, wherein Zi3As threshold value.Here
The mode that we take the result to every a line to be enumerated obtains optimal threshold.
In addition innovation of the invention is have using specifying reasonable quantizating index to be calculated to find reasonable threshold value
Body principle is: TkFor the cluster result that kth time is enumerated, its Difference Terms D=diff (T is usedk) imitated to measure overall cluster
Fruit, as threshold value reduces, the number of 0 element is gradually reduced in D, ordinary circumstance such as (1)~(9), and 0 number is in slower in D
Trend is reduced, but when a kind of stable state is split off out, as a result in will appear in a period of time system point frequently between two classes
Switching, this will make in D 0 number sharply reduce, therefore, it is necessary to by cluster result enumerate until the threshold value enumerated too
Stop when small meaningless to stable state identification.The number for finding out in wherein D 0 changes a maximum step, and as cluster threshold value reduces
Termination step, suitable stable state threshold value required for corresponding threshold value namely current data cluster, the flow chart of threshold value
Such as 3.
The embodiment of the present invention is as follows:
Embodiment 1
Embodiment 1 is directed to a second order single-input single-output system, using outputting and inputting after the method for the present invention simulation process
As shown in figure 4, simulation time length is 100s, sampling interval 0.1s, Fig. 4 (a) indicate that input variable curve, Fig. 4 (b) indicate
Output variable curve.
It is no noise added in 1 variable of embodiment, it can see from the curve of Fig. 4, system is in incipient stage (t=0~300)
In stable state, input generates a slope adjustment at t=300, and system enters transition state, when t=600 rear slopes signal node
Beam exports the final system that also tends towards stability and enters another stable state.Due to no noise added, system mode is poly- in stable state
Conjunction degree is very high, and the method that wouldn't use above-mentioned algorithms selection threshold value here, direct labor selects the threshold value t=of a very little
0.01, data are clustered, obtained result is as shown in Figure 5.
Can be seen that system in whole process from the curve in Fig. 4, there are two different stable states altogether, cluster in Fig. 5
As a result in, the data point category label in same stable state is consistent, it can be seen that there are two sections of horizontal parts in Fig. 5,
It is corresponding with two stable states of system on time.
The result of cluster illustrates distance of the state point in space, judges whether system is in stable state and needs combined data
Timing.When system is in stable state whithin a period of time, then the data in this section of time interval will be because of similarity height
And gathered same class.It in turn can be as steady-state criterion, when all data in one section of continuous time τ are all clustered calculation
Method is gathered for one kind, then is considered as system and is in stable state.Wherein the length τ of period is according to the time response of institute's research object come really
Fixed, taking τ=2t*, t* here is the unit-step response regulating time of system.
Therefore obtained system stabilization result is as shown in fig. 6, Fig. 6 (a) indicates second-order system status number strong point cluster result,
Fig. 6 (b) indicates the final stable state testing result of system, and 1 indicates to stablize in steady result, and 0 indicates unstable, as a result in can see
Out, system is about detached from from first stable state near t=300, into transition state, is again introduced into when close to t=700 steady
State compares input and output, although input, which just finishes ramp signal in t=600, enters stabilization, output y is passed through after it
The adjustment for having gone through a period of time just settles out.
Embodiment 2
Embodiment 2 joined noise and be emulated, and the single-input single-output systematic procedure used based on embodiment 1 is furtherly
Bright threshold value chooses process, and the analogue system input and output of noise are added as shown in fig. 7, Fig. 7 (a) indicates input variable curve, Fig. 7
(b) output variable curve is indicated:
It can be seen that in Fig. 7, system is in a stable state in the section of time t=0~300, then inputs u and generates one
A slope variation, through transition state after a period of time, system enters another stable state.In the state space of whole process, two
The state point aggregation extent of a stable state is higher, and transition state point is then more dispersed.Descending order is taken in the selection of threshold value
It is enumerated, with the diminution of cluster threshold value, the intensity inside every one kind is higher and higher.When proceeding to jth step, threshold value contracts
It is small then a kind of point for belonging to same stable state originally to be separated into multiclass to a certain extent, it is believed that threshold value at this time is too small super
The expectation for having gone out us, continuing to zoom out threshold value is also to be not necessarily to.At this point, -1 step of back, that is, jth threshold value achieved is
For current suitable value.
Preceding 12 step that the single-input single-output second-order system of embodiment 2 enumerate cluster is as shown in Figure 8: as can be seen from Figure 8,
Threshold value is larger in Fig. 8 (1), and data are only divided into two classes by algorithm, intermediate in an interim state in Fig. 8 (2)~Fig. 8 (9)
Data be constantly separated under the driving of algorithm.When clustering threshold value diminution, the extent of polymerization in every one kind is higher and higher,
Result of the invention is more accurate.But in Fig. 8 (10), when t is 700~1000, the state point of system is originally belonged to together
One kind, but this step threshold value diminution after, many points are therefrom stripped out, and two classes being separated out in time sequencing that
This interlocks.Indicate that during this period of time system is continually at two if these two types to be all considered as to the stable state of system, in Fig. 8 (10)
Switch between stable state, and there is no transition state --- this is not present in natural procedures system.
It is thus regarded that the state in Fig. 8 (10) has had reached the limit of threshold value diminution, reasonable distance threshold size is answered
This takes in Fig. 8 (9) the corresponding threshold value of result as final threshold value.
Embodiment 3:
Embodiment 3 is applied to the historical data that generates in real process, used certain power plant 60MW unit boiler data into
Row data experiments.Data specifically include that boiler load instructs, generated output, coal input quantity, intake, each measurement point carbonated drink temperature,
Pressure etc. 180 is tieed up totally.It chooses wherein 10000 point datas and uses the progress stable state detection of above-mentioned clustering algorithm.It is main in this time
The variable change situation wanted is as shown in Figure 9:
As seen from Figure 9, in the period studied, system loading shares adjustment biggish twice, generally in area
It is interior to produce the data of 3 sections of stable states.Main steam temperature and main steam pressure have biggish fluctuation, only from vapor (steam) temperature curve
The upper fluctuation situation for not seeing system even.The state point of all variable compositions of system is clustered below, according to above
It is middle selection threshold value method, selecting total the number of iterations is 50 times, the 50th time result such as Figure 10 is proved in data experiments, without
Method obtains useful message from result, therefore the iteration of stopping continuation herein, 0 number and its variation in the D finally obtained
If Figure 11, Figure 11 (a) indicate that 0 number is distributed in D in 50 iteration, Figure 11 (b) indicates 0 several change in D in 50 iteration
Change situation.
In step 16, the number of 0 element mutates embodiment in D, in order to more be intuitive to see the variation of cluster,
The representative iteration result of selected part is as shown in Figure 12.It can be seen that in Figure 12, when iteration is from step 16 to step 17,
Point in 0~2000 period is split into two classes, and the mutation of 0 element number coincide in the picture and D on cluster result,
Illustrate the reasonability of the method for the present invention.
According to cluster obtain as a result, suitable time span is selected to carry out stable state detection, according to the characteristic of industrial object,
Think to keep stablizing in 500 sampled points when system, then it is believed that system is in stable state.In the cluster result of embodiment,
Using being not less than in 500 points of section in cluster result, all the points cluster number is all identical, then it is assumed that and system is in stable state,
The stable state detection output result for obtaining system accordingly is as shown in figure 13, and Figure 13 (c) wherein 1 indicates to stablize, and 0 indicates unstable.
The final visible present invention has its significant technical effect, and stable state detects strong applicability, avoids in processing data
The calculation amount sharp increase phenomenon generated because of data dimension.
Above-described embodiment is not for limitation of the invention, and the present invention is not limited only to above-described embodiment, as long as meeting
The present invention claims all belong to the scope of protection of the present invention.