Disclosure of Invention
In order to overcome the problems in the related art at least to a certain extent, the application provides a method for detecting and analyzing a chromatographic spectrogram and an electronic device, a chromatographic peak is detected based on pattern matching and the slope of a chromatographic curve, and analysis is performed based on the detected chromatographic peak.
In order to achieve the purpose, the following technical scheme is adopted in the application:
in a first aspect,
the application provides a detection and analysis method of a chromatographic spectrogram, which comprises the following steps:
acquiring original spectrogram data to be processed;
for the original spectrogram data, sequentially sliding from the starting point of a chromatogram curve by adopting a time window with a preset size to perform detection processing of chromatographic peaks until the original spectrogram data is processed, detecting all chromatographic peaks in the curve, and generating a detection result based on the detected chromatographic peaks; wherein, in the detection processing process of each chromatographic peak, the method comprises the following steps:
comparing the slopes of the points on the chromatographic curve with a threshold value, determining reference points of chromatographic peaks based on the comparison result, wherein the reference points comprise a peak start point reference point, a temporary peak top point reference point and a peak end point reference point, and the threshold value is determined based on the calculation and analysis of the no-load output signal of the chromatographic instrument generating the original spectrogram data;
based on the position of the chromatographic peak represented by the reference point on the chromatographic curve, performing pattern recognition detection on the chromatographic curve near the position by taking a Gaussian wave as a matching wave, and taking the peak top in a recognition detection result as a peak top correction point;
correcting the temporary peak point reference point according to the peak point correction point, and taking the corrected point as a peak point reference point;
and determining the chromatographic peak characterized by the peak starting point reference point, the peak end point reference point and the peak top point reference point as the detected chromatographic peak.
Optionally, the comparing the slope of each point on the chromatographic curve with a threshold value, and determining the reference point of the chromatographic peak based on the comparison result includes:
calculating the slope of a point on the chromatographic curve during the sliding of the time window in real time,
comparing the calculated slope with a predetermined first threshold, comparing ordinate values of two consecutive points when the slopes of the two points are greater than the first threshold, determining a point having a smaller ordinate value as the peak start point reference point, and
analyzing and judging the positive and negative changes of the slope of the point after the peak starting point reference point, when the slope of one point is negative and the slope of the point before the point is positive, comparing the vertical coordinate values of the two points, determining the point with the larger vertical coordinate value as the temporary peak top point reference point, and
and comparing the slope of the point behind the temporary peak top point reference point with a predetermined second threshold, when the slopes of two continuous points are smaller than the second threshold, comparing the ordinate values of the two points, and determining the point with the smaller ordinate value as the peak terminal point reference point.
Optionally, the process of computing and analyzing an idle output signal of a chromatography instrument generating the raw spectrogram data includes:
carrying out statistical analysis on the slope change of an output baseline of the chromatographic instrument in no-load, calculating and determining the variance of the slope change, and further determining the standard deviation of the slope change;
three times the standard deviation of the slope change is taken as the first threshold, and minus three times the standard deviation of the slope change is taken as the second threshold.
Optionally, the correcting the temporary peak top reference point according to the peak top correction point, and taking the corrected point as the peak top reference point specifically includes:
and comparing the longitudinal coordinate values of the peak top point correction point and the temporary peak top point reference point, and determining a point with a larger longitudinal coordinate value as a peak top point reference point.
Optionally, the performing pattern recognition detection on the chromatographic curve near the position by using a gaussian wave as a matching wave, and using a peak top in a recognition detection result as a peak top correction point includes:
taking the chromatographic curve near the position as a curve to be detected, sliding the waveform of the Gaussian wave on the curve to be detected from a left end point to a right end point, simultaneously calculating the correlation coefficient of the chromatographic curve and the Gaussian wave, and obtaining a correlation coefficient array of chromatographic data of the curve to be detected relative to the Gaussian wave based on the calculation result;
and comparing and analyzing each correlation coefficient in the correlation coefficient group with a preset value, determining the position of a Gaussian peak position based on the correlation coefficient of which the coefficient value is greater than the preset value, and determining a point at the position on the curve to be detected as the peak top point correction point.
Optionally, the generating a detection result based on the detected chromatographic peak comprises:
and integrating the detected chromatographic peaks, and calculating to determine the areas and the heights of the chromatographic peaks.
Optionally, the method further comprises the step of,
detecting and processing the original spectrogram data by adopting a reference chromatogram spectrogram detection algorithm to obtain a reference detection result;
and comparing and analyzing the detection result with the reference detection result, generating a detection evaluation report, and displaying and outputting the detection evaluation report.
Optionally, the comparing and analyzing the detection result and the reference detection result includes:
matching the chromatographic peaks detected in the detection result and the reference detection result, determining the matched chromatographic peaks, and generating a detection evaluation report based on the proportion of the matched chromatographic peaks in the detected chromatographic peaks and the difference of the matched chromatographic peaks.
In a second aspect of the present invention,
the application provides an electronic device, including:
a memory having an executable program stored thereon;
a processor for executing the executable program in the memory to implement the steps of the method described above.
This application adopts above technical scheme, possesses following beneficial effect at least:
according to the technical scheme, the liquid phase chromatogram and the gas phase chromatogram are actually and specifically detected based on the mode matching and by combining the slope of the chromatogram curve, so that the detection reliability is integrally improved. In the method, the threshold value in the slope detection mode is automatically calculated and determined based on the self signal of the instrument, and the peak characteristic point detected by mode matching is only used as a reference point to correct the peak characteristic point obtained based on slope detection, so that the defects of the two modes in the prior art are overcome.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail below. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the background art, in the related art of detecting and analyzing a chromatogram, methods such as a time window method, a derivative method, and pattern matching have the defects of good multimodal overlap identification or poor universality.
In view of the above, the present application provides a method for detecting and analyzing a chromatogram, which detects a chromatogram peak based on pattern matching and combining a slope of a chromatogram curve, and the method helps to make up for the deficiencies in the prior art and realize the detection and analysis of the chromatogram with better comprehensive performance.
As shown in fig. 1 and fig. 2, in an embodiment, the method for detecting and analyzing a chromatogram provided by the present application includes:
step S110, acquiring original spectrogram data to be processed;
step S120, for original spectrogram data, adopting a preset size time window (the size of the window is set based on a standard retention time parameter of an instrument, for example, the window is 10% of the standard retention time parameter) to sequentially slide from the starting point of a chromatographic curve to perform chromatographic peak detection processing, detecting all chromatographic peaks in the curve until the original spectrogram data is processed, and generating a detection result based on the detected chromatographic peaks;
it is easy to understand that, for a certain complete spectrogram data, there are usually a plurality of chromatographic peaks, in other words, the detection process of step S120 is performed by sliding the time window to sequentially detect the individual chromatographic peaks. Specifically, as shown in fig. 2, in step S120, in the process of detecting each chromatographic peak, the method includes:
step S121, comparing the slope of each point on the chromatographic curve with a threshold value, and determining a reference point of a chromatographic peak based on the comparison result, wherein the reference point comprises a peak starting point reference point, a temporary peak top point reference point and a peak end point reference point, and the threshold value is determined based on the calculation and analysis of a no-load output signal of a chromatographic instrument for generating original spectrogram data;
in step S121, unlike the prior art, in the process of performing detection based on the slope, the threshold is not manually set, but determined based on calculation and analysis of the no-load output signal of the chromatography apparatus that generates the original spectrogram data.
Step S122, based on the position of the chromatographic peak represented by the reference point on the chromatographic curve, performing pattern recognition detection on the chromatographic curve near the position by taking a Gaussian wave as a matching wave, and taking the peak top in the recognition detection result as a peak top correction point;
step S123, correcting the temporary peak point reference point according to the peak point correction point, and taking the corrected point as the peak point reference point;
specifically, in this step, by comparing the vertical coordinate values of the peak-top correction point and the temporary peak-top reference point, a point having a larger vertical coordinate value is determined as the peak-top reference point.
And step S124, determining the chromatographic peak represented by the peak starting point reference point, the peak end point reference point and the peak top point reference point as the detected chromatographic peak.
In step S120, the time window is slid, and steps S121 to S124 are repeated in each chromatographic peak detection process until all raw spectrogram data are processed.
According to the technical scheme, the liquid phase chromatogram and the gas phase chromatogram are actually and specifically detected based on the mode matching and by combining the slope of the chromatogram curve, so that the detection reliability is integrally improved. In the method, the threshold value in the slope detection mode is automatically calculated and determined based on the self signal of the instrument, and the peak characteristic point detected by mode matching is only used as a reference point to correct the peak characteristic point obtained based on slope detection, so that the defects of the two modes in the prior art are overcome.
To facilitate understanding of the technical solutions of the present application, the technical solutions of the present application will be described below with reference to another embodiment.
In this embodiment, similarly, step S210 is performed first, and raw spectrogram data to be processed is acquired.
Then, step S220 is performed, for the original spectrogram data, a preset size time window is adopted to sequentially slide from the starting point of the chromatographic curve to perform chromatographic peak detection processing, until the original spectrogram data is processed, all chromatographic peaks in the curve are detected, and a detection result is generated based on the detected chromatographic peaks;
for example, generating the detection result based on the detected chromatographic peak includes integrating the detected chromatographic peak and computationally determining the area and height of the chromatographic peak.
Similarly, in this embodiment, in the step S220, during the detection process of each chromatographic peak, the method includes:
step S221, comparing the slope of each point on the chromatogram curve with a threshold value, and determining a reference point of a chromatogram peak based on the comparison result, wherein the reference point comprises a peak starting point reference point, a temporary peak top point reference point and a peak end point reference point, and the threshold value is determined based on the calculation and analysis of the no-load output signal of the chromatogram instrument for generating the original spectrogram data;
specifically, comparing the slope of each point on the chromatographic curve with a threshold, and determining the reference point of the chromatographic peak based on the comparison result includes:
calculating the slope G of a point i on a chromatographic curve in real time during the sliding of a time windowi,
The calculated slope GiWith a predetermined first threshold value TthreComparing, and when the slope of two consecutive points is greater than the first threshold value TthreI.e. Gi>Tthre,Gi-1>TthreComparing ordinate values of the two points, determining a point having a smaller ordinate value as a peak start point reference point, an
The change of the positive or negative slope of the point after the peak start point reference point is analytically determined, when the slope of one point is negative and the slope of the point before the point is positive, i.e. Gi-1>0,Gi<0, comparing ordinate values of the two points, determining a point having a larger ordinate value as a temporary peak top point reference point, and
slope of point after the temporary peak-top reference pointWith a predetermined second threshold value T ″threComparing, when the slope of two consecutive points is less than the second threshold T ″threWhen is at Gi<T`thre,Gi-1<T`threAnd comparing the ordinate values of the two points, and determining the point with the smaller ordinate value as the peak terminal point reference point.
First threshold value T in step S221threAnd a second threshold value TthreThe method is determined based on calculation and analysis of no-load output signals of a chromatographic instrument for generating original spectrogram data, and the specific determination process comprises the following steps:
carrying out statistical analysis on the slope change of an output baseline of the chromatographic instrument in no-load, calculating and determining the variance of the slope change, and further determining the standard deviation of the slope change;
in the field of analytical instrumentation, it is generally accepted that the slope change of random noise and baseline drift follows a normal distribution and has a zero mean, and therefore, here, it is only necessary to find its variance based on the following expression (1),
in the expression (1), e2Representing variance, e standard deviation, fiRepresenting the difference between the samples and the mean, the number of samples n is typically greater than 100.
Then, based on the statistical characteristics of normal distribution (3e can ensure that the slope of 97.3% of the baseline falls within a zero slope interval), the standard deviation of the slope change of three times is taken as a first threshold, and the standard deviation of the slope change of minus three times is taken as a second threshold, namely the first threshold Tthre3e, the second threshold value T ″thre=-3e。
After the step S221, performing a step S222, performing pattern recognition detection on the chromatographic curve near the position by using a gaussian wave as a matching wave based on the position of the chromatographic peak represented by the reference point on the chromatographic curve, and using a peak top in a recognition detection result as a peak top correction point;
specifically, in step S222, similarly to the prior art, first, a chromatographic curve near the position is used as a curve to be detected, a gaussian wave waveform is slid on the curve to be detected from a left end point to a right end point while calculating correlation coefficients of the two, a correlation coefficient group of chromatographic data of the curve to be detected with respect to the gaussian wave is obtained based on the calculation result, and a formula of a calculation process is expressed as,
in the expression (2), R represents a correlation coefficient, x
i,y
iChromatographic peak data and matched wave data respectively representing the currently calculated correlation,
the mean values of the two sets of data are shown separately.
And then, comparing and analyzing each correlation coefficient in the correlation array with a preset value, determining the position of a Gaussian peak position based on the correlation coefficient of which the coefficient value is greater than the preset value, and determining a point at the position on the curve to be detected as a peak top point correction point.
For example, the predetermined value is 0.8, and when the correlation coefficient is greater than 0.8, which indicates that the two are strongly correlated, the process of determining the position of the gaussian peak is the same as the prior art, and the detailed description thereof is omitted here.
After step S222, step S223 is performed to determine a point at which the vertical coordinate value is large as the peak top reference point by comparing the vertical coordinate values of the peak top correction point and the temporary peak top reference point.
In this embodiment, step S224 is finally performed to determine the chromatographic peak characterized by the peak start point reference point, the peak end point reference point, and the peak top point reference point as the detected chromatographic peak.
In this embodiment, in step S220, the time window is slid, and in each chromatographic peak detection process, steps S221 to S224 are repeated until all raw spectrogram data are processed.
According to the technical scheme, the liquid phase chromatogram spectrogram and the gas phase chromatogram spectrogram are actually and specifically detected based on pattern matching and in combination with the slope of a chromatogram curve, and the advantages of two modes (such as insensitivity of pattern recognition to noise, chromatographic peak width and amplitude change, good anti-interference performance, fault tolerance and robustness) are combined, so that the detection reliability is integrally improved. In the analysis method, spectrogram data is identified based on a time window sliding mode, detection analysis processing can be performed while spectrogram data is output, and therefore detection analysis results can be output more quickly.
In addition, the threshold value in the slope detection mode in the method is automatically calculated and determined based on the self signal of the instrument, and the peak characteristic point detected by the mode matching is only used as a reference point to correct the peak characteristic point obtained based on the slope detection, so that the defects of the two modes in the prior art are also overcome.
In addition, in order to facilitate a user to quickly understand and evaluate the performance of the detection and analysis method of the present application, in a specific application scenario, on the basis of the above embodiments, the technical solution of the present application further includes:
detecting and processing original spectrogram data by adopting a reference chromatographic spectrogram detection algorithm to obtain a reference detection result; comparing and analyzing the detection result with the reference detection result, generating a detection evaluation report, and displaying and outputting the detection evaluation report; the reference chromatogram detection algorithm refers to other detection and analysis methods which are different from the detection and analysis method and have the same function and purpose as the detection and analysis method.
The comparing and analyzing the detection result with the reference detection result comprises: matching the detected chromatographic peaks in the detection result and the reference detection result, determining the matched chromatographic peaks (the process is shown in fig. 3), and generating a detection evaluation report based on the proportion of the matched chromatographic peaks in the detected chromatographic peaks and the difference of the matched chromatographic peaks (the process is shown in fig. 4).
In other words, in the evaluation and analysis process of the present application, instead of comparing the results in a time series in a one-to-one manner, the results of the detected peaks are first arranged in a descending manner according to the peak heights, where the peak results include the start point, the end point, the peak height, the area, the retention time (the time corresponding to the peak top), the start point of the baseline, and the end point of the baseline, and then the results after the sorting are compared and matched (one peak result corresponds to one piece of data, and the whole piece of data will change along with the change of the peak height sequence).
As shown in fig. 3, in this implementation, the comparison matching is first considered in combination with the dual conditions of peak height and retention time. Firstly, comparing the peak heights, if the difference value of the peak heights of the comparison algorithm is in a certain range, comparing the retention time, if the difference value is also in a certain range, indicating that the corresponding peak energy of the comparison algorithm is correspondingly matched, and storing the comparison result for subsequently calculating the index values of the peak heights.
If any two conditions are not met, the current compared data are not matched, one index is fixed, and the other index is moved backwards until the data which can be matched are found or the other index is moved to the last index. It should be noted that the calculation results of different algorithms for the same peak of the same chromatogram data are not always the same, and due to the difference of the algorithms, the obtained results have a certain difference, but the phase difference value is not very large, so the difference value is within a certain range. Of course, similar peak heights may exist in the same chromatographic data. Therefore, when the results are compared, the application also can increase the comparison of retention time according to the corresponding situation to ensure the accuracy of the results.
After the matching process is completed, storing the matched peak result information in a one-to-one correspondence manner, then calculating a corresponding index according to each peak result information, entering a visual output stage of a detection evaluation report shown in fig. 4, specifically detecting an evaluation report in the stage, and including:
A. looking at approximate matching results
The number of matched and unmatched peaks is plotted for visual display, for example, a bar chart is used for display, and assuming that the number of peaks detected by method1 (the detection and analysis method of the present application) is n, the number of peaks detected by method2 (the reference chromatogram detection algorithm) is m, and the number of matched peaks is s, wherein s < min (n, m). A histogram contains three sections, the lower method1 being the number of unmatched peaks n-s, the middle being the number of two algorithmically matched peaks s, and the upper section being the number of unmatched peaks m-s of method 2. If the proportion of the middle part is larger, the number of the detected peaks of the two algorithms is larger, and the detection performance is closer.
B. In most cases, chromatographic data is noisy due to the conditions of the instrument and experiment, and different algorithms are not sensitive to noise. In matching, the small peak often appears not to match, but this situation may mislead the result obtained in a to some extent. Thus here is added a matching sum of the peak areas
Occupying the total area of all peaks
In percentage (b)
Bar diagram. If the two percentage results are very close and the values are both large, the two groups of result large peaks are considered to be matched, and the difference of the detection performances of the two algorithms is verified from another index.
C. Checking whether the matching result is abnormal or not
Specifically, for example, std (root mean square error) of the start-end point difference value of the matched upper peak is calculated. The std of the peak area difference, if std is floated within a certain range, the peak matched by the two comparison algorithms is considered to have no more prominent abnormality at the start point and the end point. Otherwise, the abnormal point is considered to be present, and then the abnormal data can be used for checking the abnormal condition of the detection algorithm.
Furthermore, it is easy to understand that, for different original data, the algorithm is contrasted and analyzed based on corresponding results, one data can correspond to one evaluation report, and a comprehensive evaluation is formed for the overall data result according to the reports to comprehensively evaluate the performance of the algorithm.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 5, the electronic device 400 includes:
a memory 401 having an executable program stored thereon;
a processor 402 for executing the executable program in the memory 401 to implement the steps of the above method.
With respect to the electronic device 400 in the above embodiment, the specific manner of executing the program in the memory 401 by the processor 402 thereof has been described in detail in the embodiment related to the method, and will not be elaborated herein.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.