CN114357885B - Photosynthetic effective radiation scattering proportion prediction method fusing multisource data - Google Patents
Photosynthetic effective radiation scattering proportion prediction method fusing multisource data Download PDFInfo
- Publication number
- CN114357885B CN114357885B CN202210010803.XA CN202210010803A CN114357885B CN 114357885 B CN114357885 B CN 114357885B CN 202210010803 A CN202210010803 A CN 202210010803A CN 114357885 B CN114357885 B CN 114357885B
- Authority
- CN
- China
- Prior art keywords
- data
- data set
- par
- dif
- site
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明涉及一种融合多源数据的光合有效辐射散射比例预测方法,属于散射辐射比估算技术领域。首先获取站点数据集、卫星数据集、再分析数据集以及角度数据;根据获取的数据,可以直接构建实测站点PAR和PARdif与卫星数据及再分析数据PAR和PARdif的随机森林决策树,根据决策树预测得到的PAR和PARdif计算生成散射辐射比;或者先根据获取的数据生成实测站点、卫星数据和再分析数据的散射辐射比,再构建实测站点与卫星数据及再分析数据的散射辐射比随机森林决策树,根据决策树直接预测得到散射辐射比。这两种方法都可以解决现有卫星遥感和再分析产品提供的散射辐射比精度较低的问题,也可解决地面观测数据站点稀疏的问题,更有利于进行面向区域或全球的散射辐射比研究。
The invention relates to a method for predicting the scattering ratio of photosynthetically active radiation by integrating multi-source data, and belongs to the technical field of scattering radiation ratio estimation. First, obtain the site data set, satellite data set, reanalysis data set and angle data; based on the obtained data, you can directly construct a random forest decision tree of the measured site PAR and PAR dif and the satellite data and reanalysis data PAR and PAR dif . According to The PAR and PAR dif predicted by the decision tree are calculated to generate the scattered radiation ratio; or the scattered radiation ratio of the measured site, satellite data and reanalysis data is first generated based on the acquired data, and then the scattered radiation ratio of the measured site, satellite data and reanalysis data is constructed. Compared with the random forest decision tree, the scattered radiation ratio is directly predicted based on the decision tree. Both methods can solve the problem of low accuracy of scattered radiation ratio provided by existing satellite remote sensing and reanalysis products. They can also solve the problem of sparse ground observation data sites, and are more conducive to regional or global scattered radiation ratio research. .
Description
技术领域Technical field
本发明涉及一种融合多源数据的光合有效辐射散射比例预测方法,属于散射辐射比估算技术领域。The invention relates to a method for predicting the scattering ratio of photosynthetically active radiation by integrating multi-source data, and belongs to the technical field of scattering radiation ratio estimation.
背景技术Background technique
散射辐射比是研究植被光合作用能力随太阳变化的重要指标,根据已有的地面通量站点的观测数据可以准确得出散射辐射比,但因站点的分布稀疏,采用地面通量站点的散射辐射比数据不利于进行面向区域或者全球分布的研究;而卫星观测和再分析产品虽然可以提供全球的散射辐射比数据,但卫星观测数据中的散射辐射数据是通过反演算法计算得到的,再分析数据中的散射辐射比数据也是经过一定模式计算得到的,因此会使得卫星观测数据和再分析数据中的散射辐射比数据存在一定误差,造成散射辐射比精度低,无法保证区域或全球的散射辐射比精度,不利于后续研究。The scattered radiation ratio is an important indicator for studying the change of vegetation photosynthesis ability with the sun. The scattered radiation ratio can be accurately obtained based on the observation data of existing ground flux stations. However, due to the sparse distribution of the stations, the scattered radiation of the ground flux stations is used. Ratio data are not conducive to research on regional or global distribution; although satellite observation and reanalysis products can provide global scattered radiation ratio data, the scattered radiation data in satellite observation data are calculated through inversion algorithms, and reanalysis The scattered radiation ratio data in the data is also calculated through a certain model. Therefore, there will be a certain error in the scattered radiation ratio data in the satellite observation data and reanalysis data, causing the scattered radiation ratio to be low in accuracy and unable to guarantee regional or global scattered radiation. The specific accuracy is not conducive to subsequent research.
发明内容Contents of the invention
本发明的目的是提供一种融合多源数据的光合有效辐射散射比例预测方法,用以解决现有卫星数据和再分析数据提供的散射辐射比精度较低的问题。The purpose of the present invention is to provide a method for predicting photosynthetically active radiation scattering ratio that integrates multi-source data to solve the problem of low accuracy of scattered radiation ratio provided by existing satellite data and reanalysis data.
本发明提出的一种融合多源数据的光合有效辐射散射比例预测方法,该方法包括以下步骤:The present invention proposes a method for predicting the proportion of photosynthetically active radiation scattering that integrates multi-source data. The method includes the following steps:
1)获取站点数据集、卫星数据集、再分析数据集以及与卫星数据集、再分析数据集对应同一时空下的角度数据;其中站点数据集包括PAR和PARdif数据以及对应的站点坐标和站点数据采集时间,卫星数据集包括PAR和PARdif数据,再分析数据集包括PAR和PARdif数据;1) Obtain the site data set, satellite data set, reanalysis data set and angle data corresponding to the same time and space as the satellite data set and reanalysis data set; the site data set includes PAR and PAR dif data and the corresponding site coordinates and sites Data collection time, the satellite data set includes PAR and PAR dif data, and the reanalysis data set includes PAR and PAR dif data;
2)根据获取的站点坐标和站点数据采集时间对卫星数据集、再分析数据集进行时空匹配;并将匹配成功后对应时空下的角度数据转换成太阳天顶角余弦;2) Perform spatio-temporal matching on the satellite data set and reanalysis data set based on the obtained site coordinates and site data collection time; and convert the corresponding angle data in space and time after successful matching into the cosine of the solar zenith angle;
3)根据站点数据集中的PAR和PARdif数据计算对应的散射辐射比,并根据时空匹配成功的卫星数据集和再分析数据集计算其对应的散射辐射比;将得到的站点散射辐射比、卫星数据散射辐射比、再分析数据散射辐射比以及与三者统一对应的太阳天顶角余弦作为多源散射辐射比训练数据集;3) Calculate the corresponding scattered radiation ratio based on the PAR and PAR dif data in the site data set, and calculate the corresponding scattered radiation ratio based on the satellite data set and reanalysis data set with successful spatiotemporal matching; combine the obtained site scattered radiation ratio, satellite The data scattering radiation ratio, the reanalysis data scattering radiation ratio and the solar zenith angle cosine corresponding to the three are used as the multi-source scattered radiation ratio training data set;
4)将得到的多源散射辐射比训练数据集输入到随机森林决策树中进行训练,得到站点散射辐射比与卫星数据散射辐射比、再分析数据散射辐射比、太阳天顶角余弦的随机森林决策树,根据所述随机森林决策树预测待测时空下的散射辐射比。4) Input the obtained multi-source scattered radiation ratio training data set into the random forest decision tree for training, and obtain the random forest of the site scattered radiation ratio, satellite data scattered radiation ratio, reanalysis data scattered radiation ratio, and solar zenith angle cosine. Decision tree, predict the scattered radiation ratio in the time and space to be measured based on the random forest decision tree.
本发明提出的一种融合多源数据的光合有效辐射散射比例预测方法,采用随机森林决策树建立实测站点的散射辐射比数据与卫星数据集和再分析数据集的散射辐射比数据之间的回归关系,得到散射辐射比随机森林决策树直接预测待测时空下的散射辐射比,更能得到不同时空下精确的散射辐射比。本发明利用多种散射辐射比数据进行预测,避免了现有散射辐射比预测精度较低的问题,更有利于进行面向区域或全球的散射辐射比研究。This invention proposes a prediction method for photosynthetically active radiation scattering ratio that integrates multi-source data and uses a random forest decision tree to establish the regression between the scattered radiation ratio data of the actual measured site and the scattered radiation ratio data of the satellite data set and the reanalysis data set. Relationship, the scattered radiation ratio is obtained. The random forest decision tree directly predicts the scattered radiation ratio under the time and space to be measured, and can obtain more accurate scattered radiation ratios under different time and space. The present invention uses a variety of scattered radiation ratio data for prediction, which avoids the existing problem of low scattered radiation ratio prediction accuracy and is more conducive to regional or global scattered radiation ratio research.
进一步地,所述站点数据集为国际通量观测研究网络数据集,所述卫星数据集为CERES数据集或BESS数据集,所述再分析数据集为MERRA2数据集。Further, the site data set is the International Flux Observation Research Network data set, the satellite data set is the CERES data set or the BESS data set, and the reanalysis data set is the MERRA2 data set.
进一步地,为了保证数据选取的准确性及后续随机森林回归模型建立的准确性,所述步骤3)中,多源散射辐射比训练数据集需将站点数据集、卫星数据集和再分析数据集中散射辐射比小于0及散射辐射比大于1的散射辐射比数据剔除。Further, in order to ensure the accuracy of data selection and the accuracy of subsequent random forest regression model establishment, in step 3), the multi-source scattering radiation ratio training data set needs to be concentrated into the site data set, satellite data set and reanalysis data set. The scattered radiation ratio data with a scattered radiation ratio less than 0 and a scattered radiation ratio greater than 1 are eliminated.
进一步地,为了保证随机森林回归模型中数据训练的可靠性,所述步骤4)中随机森林决策树的训练方法为K折交叉验证法。Furthermore, in order to ensure the reliability of data training in the random forest regression model, the training method of the random forest decision tree in step 4) is the K-fold cross-validation method.
进一步地,为了实现卫星数据集、再分析数据集与站点数据集的精确匹配,所述卫星数据集、再分析数据集与站点数据集的时空匹配方法为:先获取站点数据集的时空信息,利用双线性内插法或最近邻方法从卫星数据集和再分析数据集中提取对应时空下的点状数据。Further, in order to achieve accurate matching of the satellite data set, the reanalysis data set and the site data set, the spatiotemporal matching method of the satellite data set, the reanalysis data set and the site data set is: first obtain the spatiotemporal information of the site data set, The bilinear interpolation method or nearest neighbor method is used to extract point data in corresponding time and space from satellite data sets and reanalysis data sets.
本发明提出的一种融合多源数据的光合有效辐射散射比例预测方法,该方法包括以下步骤:The present invention proposes a method for predicting the proportion of photosynthetically active radiation scattering that integrates multi-source data. The method includes the following steps:
1)获取站点数据集、卫星数据集、再分析数据集以及与卫星数据集、再分析数据集对应时空下的角度数据;其中站点数据集包括PAR和PARdif数据以及对应的站点坐标和站点数据采集时间,卫星数据集包括PAR和PARdif数据,再分析数据集包括PAR和PARdif数据;1) Obtain the site data set, satellite data set, reanalysis data set and angle data in time and space corresponding to the satellite data set and reanalysis data set; the site data set includes PAR and PAR dif data and the corresponding site coordinates and site data At acquisition time, the satellite data set includes PAR and PAR dif data, and the reanalysis data set includes PAR and PAR dif data;
2)根据获取的站点坐标和站点数据采集时间对卫星数据集、再分析数据集进行时空匹配;并将匹配成功后对应时空下的角度数据转换成太阳天顶角余弦;将得到的站点PAR和PARdif、卫星数据PAR和PARdif、再分析数据PAR和PARdif以及与三者统一对应的太阳天顶角余弦作为多源PAR和PARdif训练数据集;2) Perform spatio-temporal matching on the satellite data set and reanalysis data set based on the acquired site coordinates and site data collection time; convert the corresponding angle data in space and time after successful matching into the cosine of the solar zenith angle; convert the obtained site PAR and PAR dif , satellite data PAR and PAR dif , reanalysis data PAR and PAR dif and the solar zenith angle cosine corresponding to the three are used as multi-source PAR and PAR dif training data sets;
3)将多源PAR和PARdif数据输入随机森林决策树中进行训练,得到PAR和PARdif随机森林决策树,预测待测时空下的PAR和PARdif;3) Input the multi-source PAR and PAR dif data into the random forest decision tree for training, obtain the PAR and PAR dif random forest decision tree, and predict the PAR and PAR dif under the time and space to be measured;
4)通过预测得到的PAR和PARdif计算得到待测时空下的散射辐射比。4) Calculate the scattered radiation ratio in the space and time to be measured through the predicted PAR and PAR dif .
本发明提供的一种融合多源数据的光合有效辐射散射比例预测方法,采用随机森林决策树建立实测站点的PAR和PARdif数据与卫星数据集中的PAR和PARdif数据之间的回归关系,得到PAR和PARdif随机森林决策树预测待测时空下的PAR和PARdif,继而计算得到待测时空下的散射辐射比,该方法避免了通过卫星数据集中PAR和PARdif数据计算得到的散射辐射比精度低的问题,通过预测得到精准的PAR和PARdif来得到精准的散射辐射比,更有利于进行面向区域或全球的散射辐射比研究。The present invention provides a method for predicting the proportion of photosynthetically active radiation scattering that integrates multi-source data. It uses a random forest decision tree to establish the regression relationship between the PAR and PAR dif data of the actual measurement site and the PAR and PAR dif data in the satellite data set, and obtains PAR and PAR dif random forest decision trees predict PAR and PAR dif under the time and space to be measured, and then calculate the scattered radiation ratio under the time and space to be measured. This method avoids the scattered radiation ratio calculated through PAR and PAR dif data in satellite data sets. For the problem of low accuracy, accurate scattered radiation ratio can be obtained by predicting accurate PAR and PAR dif , which is more conducive to regional or global scattered radiation ratio research.
进一步地,所述站点数据集为国际通量观测研究网络数据集,所述卫星数据集为CERES数据集或BESS数据集,所述再分析数据集为MERRA2数据集。Further, the site data set is the International Flux Observation Research Network data set, the satellite data set is the CERES data set or the BESS data set, and the reanalysis data set is the MERRA2 data set.
进一步地,为了保证数据选取的准确性及后续随机森林回归模型建立的准确性,所述步骤2)中,在进行数据匹配时,需将站点数据集、卫星数据集和再分析数据集中散射辐射比小于0及散射辐射比大于1对应的PAR和PARdif数据剔除。Furthermore, in order to ensure the accuracy of data selection and the accuracy of subsequent random forest regression model establishment, in step 2), when performing data matching, the scattered radiation in the site data set, satellite data set and reanalysis data set needs to be The PAR and PAR dif data corresponding to the ratio less than 0 and the scattered radiation ratio greater than 1 are eliminated.
进一步地,为了保证随机森林回归模型中数据训练的可靠性,所述步骤3)中随机森林决策树的训练方法为K折交叉验证法。Further, in order to ensure the reliability of data training in the random forest regression model, the training method of the random forest decision tree in step 3) is the K-fold cross-validation method.
进一步地,为了实现卫星数据集、再分析数据集与站点数据集的精确匹配,所述卫星数据集、再分析数据集与站点数据集的时空匹配方法为:先获取站点数据集的时空信息,利用双线性内插法或最近邻方法从卫星数据集和再分析数据集中提取对应时空下的点状数据。Further, in order to achieve accurate matching of the satellite data set, the reanalysis data set and the site data set, the spatiotemporal matching method of the satellite data set, the reanalysis data set and the site data set is: first obtain the spatiotemporal information of the site data set, The bilinear interpolation method or nearest neighbor method is used to extract point data in corresponding time and space from satellite data sets and reanalysis data sets.
附图说明Description of drawings
图1为本发明融合多源数据的光合有效辐射散射比例预测方法流程图;Figure 1 is a flow chart of the method for predicting photosynthetically active radiation scattering proportion by merging multi-source data according to the present invention;
图2为本发明实验1中多产品RFM模型直接预测与站点实测的Kd均值对比图;Figure 2 is a comparison chart of the mean K d values directly predicted by the multi-product RFM model and measured at the site in Experiment 1 of the present invention;
图3(a)为本发明实验1中多产品和单一产品模型直接预测Kd结果的指标R对比箱型图;Figure 3(a) is a box plot comparing the indicator R of the K d results directly predicted by the multi-product and single-product models in Experiment 1 of the present invention;
图3(b)为本发明实验1中多产品和单一产品模型直接预测Kd结果的指标MSE对比箱型图;Figure 3(b) is a box plot comparing the indicator MSE of the K d results directly predicted by the multi-product and single-product models in Experiment 1 of the present invention;
图3(c)为本发明实验1中多产品和单一产品模型直接预测Kd结果的指标RMSE对比箱型图;Figure 3(c) is a box plot comparing the indicator RMSE of the K d results directly predicted by the multi-product and single-product models in Experiment 1 of the present invention;
图3(d)为本发明实验1中多产品和单一产品模型直接预测Kd结果的指标MAE对比箱型图;Figure 3(d) is a box plot comparing the indicator MAE of the K d results directly predicted by the multi-product and single-product models in Experiment 1 of the present invention;
图3(e)为本发明实验1中多产品和单一产品模型直接预测Kd结果的指标R2对比箱型图;Figure 3(e) is a box plot comparing the indicator R 2 of the multi-product and single-product models directly predicting K d results in Experiment 1 of the present invention;
图4为本发明实验2中预测得到的不同植被类型下的Kd分布;Figure 4 shows the K d distribution under different vegetation types predicted in Experiment 2 of the present invention;
图5为本发明实验1中预测的Kd与PAR的散点图;Figure 5 is a scatter plot of K d and PAR predicted in Experiment 1 of the present invention;
图6为本发明实验2中预测的Kd与PAR的散点图。Figure 6 is a scatter plot of K d and PAR predicted in Experiment 2 of the present invention.
具体实施方式Detailed ways
本发明提供的融合多源数据的光合有效辐射散射比例预测方法包括构建Kd模型直接预测Kd和构建PAR和PARdif模型间接预测Kd,下面结合附图对本发明做进一步详细的说明。The method for predicting photosynthetically active radiation scattering ratio by integrating multi-source data provided by the present invention includes constructing a K d model to directly predict K d and constructing a PAR and PAR dif model to indirectly predict K d . The present invention will be further described in detail below with reference to the accompanying drawings.
直接预测Kd实施例:Example of directly predicting K d :
本发明提供的一种融合多源数据的光合有效辐射散射比例预测方法,通过构建Kd随机森林回归模型直接预测Kd的具体流程比如图1所示。首先获取站点数据集、卫星数据集、再分析数据集这三种数据集的PAR和PARdif数据以及与卫星数据集、再分析数据集对应的时空下的角度数据;再根据获取的站点坐标和站点数据采集时间对卫星数据集、再分析数据集进行时空匹配;并将匹配成功后对应时空下的角度数据转换成太阳天顶角余弦,然后根据站点数据集中的PAR和PARdif数据计算对应的散射辐射比,并根据匹配成功的卫星数据集和再分析数据集计算其对应的散射辐射比;将得到的站点散射辐射比、卫星数据散射辐射比、再分析数据散射辐射比以及与三者统一对应的太阳天顶角余弦作为多源散射辐射比训练数据集;最后将得到的多源散射辐射比训练数据集输入到随机森林决策树中进行训练,得到站点散射辐射比与卫星数据散射辐射比、再分析数据散射辐射比的随机森林决策树,预测待测时空下的散射辐射比,以解决已有卫星数据提供的散射辐射比精度较低的问题。The invention provides a method for predicting the proportion of photosynthetically active radiation scattering that integrates multi-source data. The specific process for directly predicting K d by constructing a K d random forest regression model is shown in Figure 1. First, obtain the PAR and PAR dif data of the three data sets: site data set, satellite data set, and reanalysis data set, as well as the angle data in space and time corresponding to the satellite data set and reanalysis data set; then based on the obtained site coordinates and During the site data collection time, space-time matching is performed on the satellite data set and the reanalysis data set; and after successful matching, the angle data in the corresponding time and space are converted into the cosine of the solar zenith angle, and then the corresponding PAR and PAR dif data in the site data set are calculated. Scattered radiation ratio, and calculate the corresponding scattered radiation ratio based on the successfully matched satellite data set and reanalysis data set; unify the obtained site scattered radiation ratio, satellite data scattered radiation ratio, reanalysis data scattered radiation ratio and the three The corresponding cosine of the solar zenith angle is used as the multi-source scattered radiation ratio training data set; finally, the obtained multi-source scattered radiation ratio training data set is input into the random forest decision tree for training, and the site scattered radiation ratio and the satellite data scattered radiation ratio are obtained. , and then analyze the random forest decision tree of the scattering radiation ratio of the data to predict the scattering radiation ratio in the time and space to be measured, so as to solve the problem of low accuracy of the scattering radiation ratio provided by existing satellite data.
步骤1.获取数据Step 1. Get data
1)站点实测数据集1) Site measured data set
本发明获取的站点实测数据为国际通量观测研究网络数据集(FLUXNET),FLUXNET可提供自1991年起大量地表站点的连续实测数据,包括总光合有效辐射PAR(μmol m-2s-1)和散射光合有效辐射PARdif(μmol m-2s-1)数据,该数据集以半小时为时间间隔如今FLUXNET站点总数已经超过200个,广泛分布于全球各地。在本实施例中选取的是FLUXNET2015产品中42个站点在2000~2010年的PAR(μmol m-2s-1)和PARdif(μmol m-2s-1)日数据,同时还包括对应站点位置坐标及数据获取时间。其中PAR是光合有效辐射,PARdif为光合有效辐射中的散射辐射。The site measured data obtained by this invention is the International Flux Observation Research Network Data Set (FLUXNET). FLUXNET can provide continuous measured data of a large number of surface sites since 1991, including total photosynthetically active radiation PAR (μmol m -2 s -1 ) and scattered photosynthetically active radiation PAR dif (μmol m -2 s -1 ) data. This data set is based on half-hour time intervals. Today, the total number of FLUXNET sites has exceeded 200, which are widely distributed around the world. In this example, the PAR (μmol m -2 s -1 ) and PAR dif (μmol m -2 s -1 ) daily data of 42 sites in the FLUXNET2015 product from 2000 to 2010 are selected, and the corresponding sites are also included. Location coordinates and data acquisition time. Among them, PAR is photosynthetically active radiation, and PAR dif is the scattered radiation in photosynthetically active radiation.
2)卫星数据集2)Satellite data set
本发明选取的卫星数据集为CERES数据集和BESS数据集中的至少一种数据集。The satellite data set selected in this invention is at least one of the CERES data set and the BESS data set.
CERES数据集:CERES dataset:
云和地球辐射能系统(Clouds and the Earth’s Radiant Energy System,CERES)是世界上唯一一个主要目的用于观测地球辐射预算(ERB)的仪器并纪录ERB全球气候数据的项目。地球辐射预算(ERB)是用于描述地球气候受地球吸收的太阳辐射量和发射到太空的红外能量的大小及其差异的一个指标。CERES项目提供了基于卫星的ERB观测数据。它是使用直接测量反射的太阳辐射的CERES仪器在几颗卫星上的测量结果以及其他仪器记录的数据来生成一套完整的ERB产品数据集,这个产品数据集被应用于气候,天气和其他学科研究。本实施例中选取的是CERES产品数据集中SYN1deg-Level3中的2000~2010年的光合有效辐射散射辐射(PARdif(W m-2)和光合有效辐射直接辐射(PARdir(W m-2)的全球日数据,其空间分辨率为1°×1°。求解PARdif和PARdir的总和得到对应的PAR,最终使用CERES产品数据集中的PARdif(W m-2)和计算得到的PAR(W m-2)。The Clouds and the Earth's Radiant Energy System (CERES) is the world's only project whose primary purpose is to observe the Earth's Radiation Budget (ERB) instrument and record ERB global climate data. The Earth Radiation Budget (ERB) is an indicator used to describe the amount of solar radiation absorbed by the Earth and the magnitude and difference of infrared energy emitted into space due to the Earth's climate. The CERES project provides satellite-based ERB observation data. It uses measurements from the CERES instrument, which directly measures reflected solar radiation, on several satellites, as well as data recorded by other instruments, to generate a complete set of ERB product data sets that are used in climate, weather and other disciplines. Research. In this example, the photosynthetically active radiation scattered radiation (PAR dif (W m -2 ) and photosynthetically active radiation direct radiation (PAR dir (W m -2 ) from 2000 to 2010 in the CERES product data set SYN1deg-Level3 are selected) The global daily data has a spatial resolution of 1° × 1°. Solve the sum of PAR dif and PAR dir to obtain the corresponding PAR, and finally use the PAR dif (W m -2 ) in the CERES product data set and the calculated PAR ( W m -2 ).
BESS数据集:BESS data set:
BESS(Breathing Earth System Simulator,BESS)是一种结合大气和冠层辐射传输、冠层光合作用、蒸腾作用和能量平衡的简化过程模型。它将大气辐射传输模型和人工神经网络与来自MODIS大气产品相结合,生成5公里的每日产品。本实施例中选取的是BESS_Rad中2000~2010年分辨率为5km的PAR(mol m-2d-1)和PARdif(mol m-2d-1)数据。BESS (Breathing Earth System Simulator, BESS) is a simplified process model that combines atmospheric and canopy radiation transfer, canopy photosynthesis, transpiration and energy balance. It combines atmospheric radiative transfer models and artificial neural networks with atmospheric products from MODIS to generate 5 km daily products. In this embodiment, the PAR (mol m -2 d -1 ) and PAR dif (mol m -2 d -1 ) data with a resolution of 5km from 2000 to 2010 in BESS_Rad are selected.
3)再分析数据集3) Reanalyze the data set
MERRA2再分析数据集:MERRA2 reanalysis data set:
MERRA2(Modern-Era Retrospective analysis for Research andApplications,MERRA2)是一套美国国家航空航天的长时间序列的再分析数据集,由NASA办公室制作并发布。该数据集中包含多种气象变量,如净辐射、温度、相对湿度、风速等,同时,MERRA2数据覆盖全球,空间分辨率为0.5°×0.625°,时间分辨率为1小时。本实施例中选取的是MERRA2中2000年到2010年光合有效辐射散射辐射(PARdif(W m-2))和光合有效辐射直接辐射(PARdir(W m-2))的全球小时均值数据,求解PARdif和PARdir的总和得到对应的PAR,最终使用的是MERRA2产品数据集中的PARdif(W m-2)和计算得到的PAR(W m-2)。MERRA2 (Modern-Era Retrospective analysis for Research and Applications, MERRA2) is a long-term series of reanalysis data sets of the United States National Aeronautics and Astronautics, produced and released by the NASA office. This data set contains a variety of meteorological variables, such as net radiation, temperature, relative humidity, wind speed, etc. At the same time, MERRA2 data covers the whole world, with a spatial resolution of 0.5° × 0.625° and a temporal resolution of 1 hour. In this example, the global hourly average data of photosynthetically active radiation scattered radiation (PAR dif (W m -2 )) and photosynthetically active radiation direct radiation (PAR dir (W m -2 )) in MERRA2 from 2000 to 2010 are selected. , solve the sum of PAR dif and PAR dir to obtain the corresponding PAR, and finally use the PAR dif (W m -2 ) and calculated PAR (W m -2 ) in the MERRA2 product data set.
4)角度数据4)Angle data
本发明获取的角度数据是与获取的站点数据集、卫星数据集和再分析数据集所对应同一时空下的角度数据,包括数据获取时间、经纬度、太阳高度、太阳赤纬角,通过获取的上述数据计算太阳天顶角的余弦,用来表征时间与经纬度。The angle data obtained by the present invention is the angle data in the same time and space corresponding to the obtained site data set, satellite data set and reanalysis data set, including data acquisition time, longitude and latitude, solar altitude, and solar declination angle. Through the above obtained The data calculates the cosine of the sun's zenith angle, which is used to represent time and latitude and longitude.
步骤2.数据处理Step 2. Data processing
为了保证后续在建立站点数据集与卫星数据集、再分析数据集之间随机森林决策树的准确性,本发明需先提取与站点位置及站点数据获取时间相匹配的卫星数据集和再分析数据集,得到时空匹配后的多源PAR和PARdif数据,进一步为了保证数据选取的准确性还需要对匹配后的数据进行质量筛选,此外本发明还通过角度数据将时间和经纬度这三条特征用太阳天顶角余弦来表述。In order to ensure the accuracy of the random forest decision tree between the subsequent establishment of the site data set, the satellite data set, and the reanalysis data set, the present invention needs to first extract the satellite data set and reanalysis data that match the site location and site data acquisition time. Set to obtain multi-source PAR and PAR dif data after spatio-temporal matching. Furthermore, in order to ensure the accuracy of data selection, the quality of the matched data needs to be screened. In addition, the present invention also uses angle data to use the three characteristics of time and longitude and latitude with the sun. Expressed as the cosine of the zenith angle.
首先对获取的所有数据进行时空匹配,先提取对应站点数据集同一时间、同一位置处的卫星数据集和再分析数据集,例如,对应站点BE-Bra处的经纬度已知,站点BE-Bra所获取的数据时间也确定,假设BE-Bra的位置坐标为(A°S,B°E),其中一条数据的获取时间为2000年1月1日,则匹配提取出CERES数据集中在位置为(A°S,B°E),时间为2000年1月1日的数据,同理,其他数据匹配方式相同。本实施例中,FLUXNET只有42个站点有PAR和PARdif的观测数据,而CERES、BESS、MERRA2都为面状数据,因此需要从面状数据中提取出点数据,本实施例首先利用双线性内插的方法将数据集统一至相同空间分辨率,即1°×1°,并利用最近邻法提取这三种面状数据中的42个站点的数据。作为其他实施方式,也可采用最近邻方法将数据集统一至相同分辨率并提取面状数据中的点状数据。最后,得到时空匹配后的FLUXNET站点PAR和PARdif数据,BESS的PAR和PARdif数据、CERES的PAR和PARdif数据和MERRA2再分析PAR和PARdif数据。First, perform spatio-temporal matching on all the acquired data, and first extract the satellite data set and re-analysis data set at the same time and location of the corresponding site data set. For example, the longitude and latitude of the corresponding site BE-Bra is known, and the location of the site BE-Bra is known. The time of the acquired data is also determined. Assume that the position coordinates of BE-Bra are (A°S, B°E), and the acquisition time of one of the data is January 1, 2000. Then the matching extracted CERES data is concentrated at the position ( A°S, B°E), the time is the data on January 1, 2000. Similarly, other data matching methods are the same. In this embodiment, only 42 FLUXNET sites have observation data of PAR and PAR dif , while CERES, BESS, and MERRA2 are all planar data, so point data needs to be extracted from the planar data. In this embodiment, double lines are first used The data set was unified to the same spatial resolution, that is, 1° × 1°, using sexual interpolation, and the nearest neighbor method was used to extract data from 42 sites in these three types of planar data. As another implementation manner, the nearest neighbor method can also be used to unify the data sets to the same resolution and extract point data in the planar data. Finally, the spatiotemporally matched FLUXNET site PAR and PAR dif data, BESS PAR and PAR dif data, CERES PAR and PAR dif data, and MERRA2 reanalyzed PAR and PAR dif data were obtained.
由于所获取的站点、卫星、再分析数据集中的PAR和PARdif数据的单位并不完全一致,其中BESS的PAR和PARdif单位为(mol m-2d-1)、FLUXNET的PAR和PARdif单位为(μmol m-2s-1)、CERES的PAR和PARdif的单位为(W m-2)、MERRA2的PAR和PARdif单位为(W m-2),为了保证后续回归模型建立的准确性,需要将所有的数据单位统一至μmol m-2s-1,单位转换公式如下:Since the units of PAR and PAR dif data obtained in the station, satellite, and reanalysis data sets are not completely consistent, the units of PAR and PAR dif of BESS are (mol m -2 d -1 ), and the units of PAR and PAR dif of FLUXNET are (mol m -2 d -1 ). The unit is (μmol m -2 s -1 ), the unit of PAR and PAR dif of CERES is (W m -2 ), and the unit of PAR and PAR dif of MERRA2 is (W m -2 ). In order to ensure that the subsequent regression model is established For accuracy, all data units need to be unified to μmol m -2 s -1 . The unit conversion formula is as follows:
CERES和MERRA2的PAR/PARdif数据单位转换公式为:The PAR/PAR dif data unit conversion formula for CERES and MERRA2 is:
1W m-2=4.56μmol m-2 s-1 (1)1W m -2 = 4.56μmol m -2 s -1 (1)
BESS的PAR/PARdif数据单位转换公式为:The BESS PAR/PAR dif data unit conversion formula is:
1mol m-2 d-1=11.5741μmol m-2 s-1 (2)1mol m -2 d -1 =11.5741μmol m -2 s -1 (2)
进一步,为了方便后续模型的建立,本发明将时间和经纬度这三条特征用太阳天顶角的余弦来表述,其中的太阳天顶角的余弦(cos Z)是根据太阳天顶角和太阳方位角互为余角这一原理计算得出,太阳天顶角主要进行了年度订正,经度订正,时刻订正三项订正。订正公式为:Furthermore, in order to facilitate the establishment of subsequent models, the present invention expresses the three characteristics of time and longitude and latitude as the cosine of the solar zenith angle. The cosine of the solar zenith angle (cos Z) is based on the solar zenith angle and the solar azimuth angle. Calculated based on the principle of supplementary angles, the solar zenith angle is mainly corrected by annual correction, longitude correction and time correction. The revised formula is:
sin(A+Z)=1 (5)sin(A+Z)=1 (5)
其中Z为太阳天顶角,A为太阳高度角,h⊙为太阳高度,δ为太阳赤纬角,当地的地理纬度,τ为当时的太阳时角。Where Z is the solar zenith angle, A is the solar altitude angle, h ⊙ is the solar altitude, δ is the solar declination angle, The local geographical latitude, τ is the solar hour angle at that time.
进一步,因获取的卫星数据集和再分析数据集中对应部分站点的观测数据存在缺失(结果为0或者-9999),为了避免误差且保证数据选取的准确性及后续模型建立的准确性,需要对所有数据进行质量筛选,本发明将PAR和PARdif数据中值存在0或-9999的数据给剔除,同时对应站点处的数据将不再用于模型构建。Furthermore, because the observation data of corresponding parts of the satellite data set and the reanalysis data set are missing (the result is 0 or -9999), in order to avoid errors and ensure the accuracy of data selection and subsequent model establishment, it is necessary to All data are quality screened. This invention eliminates data with a value of 0 or -9999 in PAR and PAR dif data. At the same time, the data at the corresponding site will no longer be used for model construction.
步骤3.生成多源散射辐射比训练数据集Step 3. Generate multi-source scattered radiation ratio training data set
根据站点数据集中的PAR和PARdif数据计算对应的散射辐射比,并根据时空匹配成功的卫星数据集中的PAR和PARdif和再分析数据集中的PAR和PARdif计算其对应的散射辐射比,计算公式为:Calculate the corresponding scattered radiation ratio based on the PAR and PAR dif data in the site data set, and calculate the corresponding scattered radiation ratio based on the PAR and PAR dif in the satellite data set with successful spatiotemporal matching and the PAR and PAR dif in the reanalysis data set. Calculate The formula is:
将得到的站点散射辐射比、卫星数据散射辐射比、再分析数据散射辐射比以及与三者统一对应的太阳天顶角余弦作为多源散射辐射比训练数据集。在此过程中,将站点散射辐射比、卫星数据散射辐射比、再分析数据散射辐射比以及与三者统一对应的太阳天顶角余弦进行汇总,若其中任一数据集中Kd存在小于等于0或大于1的情况,便将这一数据集中小于等于0或大于1的Kd剔除,同时剔除与其时空相匹配的其他数据集的数据;将剔除后剩余的站点散射辐射比、卫星数据散射辐射比、再分析数据散射辐射比以及与三者统一对应的太阳天顶角余弦作为多源散射辐射比训练数据集。The obtained site scattered radiation ratio, satellite data scattered radiation ratio, reanalysis data scattered radiation ratio and the solar zenith angle cosine corresponding to the three are used as the multi-source scattered radiation ratio training data set. In this process, the site scattered radiation ratio, the satellite data scattered radiation ratio, the reanalysis data scattered radiation ratio, and the cosine of the solar zenith angle corresponding to the three are summarized. If K d is less than or equal to 0 in any of the data sets, or greater than 1, K d less than or equal to 0 or greater than 1 will be eliminated from this data set, and data from other data sets that match its time and space will be eliminated; the remaining site scattered radiation ratios, satellite data scattered radiation after elimination will The ratio, reanalysis data scattering radiation ratio and the solar zenith angle cosine corresponding to the three are used as the multi-source scattered radiation ratio training data set.
在本实施例中,通过步骤2得到的PAR数据集有94948条,PARdif数据集有52232条。然后通过PARdif与PAR相比计算得到的Kd数据集也为52232条,将卫星的Kd数据、再分析产品的Kd数据、FLUXNET站点的Kd数据以及与三者统一对应的太阳天顶角的余弦数据进行汇总,并剔除掉小于等于0或大于1的Kd,最终得到可以使用的39个站点的Kd数据和对应卫星Kd数据、再分析数据Kd数据以及与三者统一对应的太阳天顶角的余弦作为多源散射辐射比训练数据集以供后续使用。In this embodiment, the PAR data set obtained through step 2 has 94,948 entries, and the PAR dif data set has 52,232 entries. Then the K d data set calculated by comparing PAR dif with PAR is also 52232. The K d data of the satellite, the K d data of the reanalysis product, the K d data of the FLUXNET site and the solar sky corresponding to the three are unified The cosine data of the vertex angle are summarized, and K d less than or equal to 0 or greater than 1 is eliminated. Finally, the K d data of 39 stations that can be used and the corresponding satellite K d data, reanalysis data K d data and the three are obtained. The uniformly corresponding cosine of the solar zenith angle is used as a multi-source scattered radiation ratio training data set for subsequent use.
步骤4.构建随机森林决策树并进行训练Step 4. Build a random forest decision tree and train it
随机森林是机器学习中集成学习部分的一种模型,它是一种统计学习的方法,利用Bootsrap重采样方法从原始样本中随机重复抽样,然后根据样本特征所占权重的大小,以最大权重的特征作为根节点,依次建立分支来构建决策树,最后,通过投票得出最终的预测结果。本发明采用随机森林的方法可以得到更加精准的预测结果,并且避免了过拟合造成的数据集不适配的问题。Random forest is a model of the integrated learning part of machine learning. It is a statistical learning method that uses the Bootsrap resampling method to randomly repeatedly sample from the original sample, and then based on the weight of the sample features, with the maximum weight Features serve as root nodes, and branches are established sequentially to build a decision tree. Finally, the final prediction result is obtained through voting. The present invention adopts the random forest method to obtain more accurate prediction results, and avoids the problem of data set mismatch caused by overfitting.
本发明采用的随机森林(Random forest,RFM模型)是基于Python的Sci-kitLearn库中的一个模型,将步骤3中得到的多源散射辐射比训练数据集输入RFM模型中,实现对随机森林决策树的训练,继而可以利用训练后随机森林决策树预测任一时空下的Kd,为后续面向区域或全球的散射辐射比研究提供有力数据。The random forest (Random forest, RFM model) used in this invention is a model in the Sci-kitLearn library based on Python. The multi-source scattered radiation ratio training data set obtained in step 3 is input into the RFM model to realize random forest decision-making. After training the tree, the trained random forest decision tree can be used to predict K d in any time and space, providing powerful data for subsequent regional or global scattering radiation ratio research.
在训练过程中,随机森林回归模型将输入数据集随机切分并单独建立决策树得到预测结果,其中,每个决策树的预测结果是根节点到输出叶子节点的均值,然后,得票数最多的结果就是最终的预测结果。在进行模型训练之前,本发明将数据集打乱,避免模拟结果出现过拟合,采用K折交叉验证(K-fold cross validation)方法进行数据训练。在本实施例中,模型训练的方法为十折交叉验证法,即将获取的训练数据集分割成10份,其中9份用于训练,1份用于验证,不断更换训练和验证数据,直至每一份数据都用于验证,避免因训练数据选取不同而导致模型精度出现较大差异。同时,随机森林进行模型训练时对数据集的选取是完全随机的。本发明对学习过程进行了监测,没有出现严重的过拟合现象,因此不用对数据集进行正则化处理。本发明使用网格搜索的方法调整超参数,也避免了因为模型自身参数导致的过拟合。During the training process, the random forest regression model randomly divides the input data set and builds a separate decision tree to obtain the prediction result. The prediction result of each decision tree is the mean value from the root node to the output leaf node. Then, the one with the most votes The result is the final prediction. Before model training, the present invention disrupts the data set to avoid overfitting of the simulation results, and uses the K-fold cross validation method for data training. In this embodiment, the model training method is the ten-fold cross-validation method, that is, the acquired training data set is divided into 10 parts, of which 9 parts are used for training and 1 part is used for verification. The training and verification data are continuously replaced until each One set of data is used for verification to avoid large differences in model accuracy due to different selections of training data. At the same time, random forest selects data sets completely randomly when training the model. The present invention monitors the learning process, and there is no serious over-fitting phenomenon, so there is no need to regularize the data set. This invention uses the grid search method to adjust hyperparameters and avoids overfitting caused by the parameters of the model itself.
为了验证本发明算法通过构建的Kd随机森林回归模型直接预测的Kd是否准确,通过实验1进一步说明。In order to verify whether the K d directly predicted by the K d random forest regression model constructed by the algorithm of the present invention is accurate, Experiment 1 is further illustrated.
实验1:Experiment 1:
实验构建单一RFM模型和多产品RFM模型,其中,单一RFM模型指的是只使用CERES、BESS、MERRA2中的一种数据和站点数据构建RFM模型;多产品模型指的是使用CERES、BESS、MERRA2三种数据一起和站点数据构建RFM模型;实验通过对比多产品RFM模型与单一RFM模型预测精度,来证明采用卫星数据和再分析数据与站点数据训练的RFM模型预测Kd的准确性和可靠性。其中采用十折交叉验证方法来验证模型预测精度,十折交叉验证指的是构建模型时将数据集分割成10份(9份用于训练,1份用于验证),不断更换验证数据,直至每一份数据都用于验证;并使用相关系数(R),均方根误差(RMSE),绝对平均误差(MAE),均方误差(MSE)和确定系数(R2)这四个统计指标来评估RFM模型,R2的范围在0到1之间,可以用来评判回归模型基于样本的拟合优度;对比单一RFM模型和多产品RFM模型R2与RMSE,可以得到他们之间的差异,RMSE和MAE可以用来表示预测结果和观测结果的偏离程度。The experiment builds a single RFM model and a multi-product RFM model. The single RFM model refers to using only one kind of data and site data from CERES, BESS, and MERRA2 to build the RFM model; the multi-product model refers to using CERES, BESS, and MERRA2. The three types of data are used together with site data to construct an RFM model; the experiment compares the prediction accuracy of the multi-product RFM model with that of a single RFM model to prove the accuracy and reliability of predicting K d using the RFM model trained with satellite data, reanalysis data and site data. . The ten-fold cross-validation method is used to verify the prediction accuracy of the model. Ten-fold cross-validation refers to dividing the data set into 10 parts (9 parts for training and 1 part for verification) when building the model, and continuously replacing the verification data until Each piece of data is used for verification; and the four statistical indicators of correlation coefficient (R), root mean square error (RMSE), absolute mean error (MAE), mean square error (MSE) and coefficient of determination (R 2 ) are used To evaluate the RFM model, the range of R 2 is between 0 and 1, which can be used to judge the goodness of fit of the regression model based on the sample; comparing the R 2 and RMSE of the single RFM model and the multi-product RFM model, you can get the difference between them Difference, RMSE and MAE can be used to express the degree of deviation between the predicted results and the observed results.
实验得到的单一RFM模型和多产品RFM模型评价指标结果如表1所示:The evaluation index results of single RFM model and multi-product RFM model obtained from the experiment are shown in Table 1:
表1Table 1
从表1可以看出,单一RFM模型和多产品RFM模型的MSE、RMSE、MAE均较小,但对比发现多产品RFM模型的预测结果与站点观测数据的相关程度最高,三种单一RFM模型的平均R均小于多产品RFM模型的平均R,三种单一RFM模型的平均R2均小于多产品RFM模型的平均R2,同时,对比不同模型的MSE、RMSE、MAE指标,可以看到多产品RFM模型得到的三种指标数值均小于其余三种单一RFM模型得到的MSE、RMSE、MAE指标数值。而且,从图3(a)-3(e)展示的多产品和单一产品模型直接预测Kd结果的各指标对比箱型图,可以更直观地看出多产品RFM模型得到的MSE、RMSE、MAE指标均小于其余三种单一RFM模型得到的MSE、RMSE、MAE指标,多产品RFM模型得到的R和R2均大于其余三种单一产品RFM模型得到的R和R2,从表1和图3(a)-3(e)所展示的实验结果都说明了多产品RFM模型的预测结果比只使用一种产品构建的RFM模型的预测结果好,进一步证明了对比单一RFM模型,通过多产品的RFM模型预测得到的散射辐射比的精度更高。同时证明了采用本发明方法直接预测散射辐射比的可靠性和实用性,可以为后续散射辐射比研究提供有力的数据支撑。As can be seen from Table 1, the MSE, RMSE, and MAE of the single RFM model and the multi-product RFM model are all small. However, the comparison shows that the prediction results of the multi-product RFM model have the highest correlation with the site observation data. The average R is smaller than the average R of the multi-product RFM model. The average R 2 of the three single RFM models is smaller than the average R 2 of the multi-product RFM model. At the same time, comparing the MSE, RMSE, and MAE indicators of different models, we can see that the multi-product The three index values obtained by the RFM model are all smaller than the MSE, RMSE, and MAE index values obtained by the other three single RFM models. Moreover, from the comparison box plots of various indicators of the K d results directly predicted by the multi-product and single-product models shown in Figures 3(a)-3(e), we can more intuitively see the MSE, RMSE, and The MAE indicators are all smaller than the MSE, RMSE, and MAE indicators obtained by the other three single-product RFM models. The R and R 2 obtained by the multi-product RFM model are both greater than the R and R 2 obtained by the other three single-product RFM models. From Table 1 and Figure The experimental results shown in 3(a)-3(e) all illustrate that the prediction results of the multi-product RFM model are better than the prediction results of the RFM model built using only one product, further proving that compared with a single RFM model, through multi-product The RFM model predicts scattered radiation with higher accuracy than At the same time, it is proved that the method of the present invention is used to directly predict the reliability and practicability of the scattered radiation ratio, and can provide strong data support for subsequent research on the scattered radiation ratio.
因从表1和图3(a)-3(e)得出多产品RFM模型比单一RFM模型预测结果更好,因此建立多产品RFM模型预测Kd与站点实测Kd的39个站点Kd均值的对比图,如图2所示。可以发现,除了一些观测值过高或过低的站点的预测效果不理想以外,CA-Qfo、DK-Sor、GF-Guy等其他站点的预测结果却表现出了较好的拟合性,可能是随机森林回归模型在预测不同区域时具有差异性。总体来讲,除了一些观测值过高或过低的站点以外,大多数站点的预测结果与实测结果的拟合效果较好,证明采用多产品RFM模型直接预测Kd具有实用价值。Since it can be concluded from Table 1 and Figures 3(a)-3(e) that the multi-product RFM model has better prediction results than the single RFM model, a multi-product RFM model was established to predict K d and the site measured K d for 39 sites . The comparison chart of the mean values is shown in Figure 2. It can be found that, in addition to the unsatisfactory prediction results of some stations with too high or too low observation values, the prediction results of other stations such as CA-Qfo, DK-Sor, GF-Guy, etc. have shown good fitting properties, which may be It is the random forest regression model that has differences in predicting different regions. Generally speaking, except for some sites where the observed values are too high or too low, the prediction results of most sites fit well with the measured results, proving that the use of multi-product RFM models to directly predict K d has practical value.
间接预测Kd实施例:Example of indirect prediction of K d :
本发明提供的一种融合多源数据的光合有效辐射散射比例预测方法,通过构建PAR和PARdif随机森林回归模型间接预测Kd的具体流程如图1所示。首先获取站点数据集、卫星数据集、再分析数据集这三种数据集的PAR和PARdif数据以及与卫星数据集、再分析数据集对应时空下的角度数据;再根据获取的站点坐标和站点数据采集时间对卫星数据集、再分析数据集进行时空匹配;并将匹配成功后对应时空下角度数据转换成太阳天顶角余弦;将得到的站点PAR和PARdif、卫星数据PAR和PARdif、再分析数据PAR和PARdif以及与三者统一对应的太阳天顶角余弦作为多源PAR和PARdif训练数据集;将多源PAR和PARdif数据输入随机森林决策树进行训练,得到PAR和PARdif随机森林决策树,预测待测时空下的PAR和PARdif;通过预测得到的PAR和PARdif计算得到待测时刻和待测坐标处的散射辐射比,以解决现有卫星数据提供的PAR和PARdif精度低造成Kd精度低的问题。The invention provides a method for predicting the proportion of photosynthetically active radiation scattering that integrates multi-source data. The specific process of indirectly predicting K d by constructing a PAR and PAR dif random forest regression model is shown in Figure 1. First, obtain the PAR and PAR dif data of the three data sets: site data set, satellite data set, and reanalysis data set, as well as the angle data in time and space corresponding to the satellite data set and reanalysis data set; then according to the obtained site coordinates and site During the data collection time, space-time matching is performed on the satellite data set and the reanalysis data set; and after successful matching, the corresponding space-time angle data is converted into the cosine of the solar zenith angle; the obtained site PAR and PAR dif , satellite data PAR and PAR dif , The reanalysis data PAR and PAR dif and the cosine of the solar zenith angle corresponding to the three are used as the multi-source PAR and PAR dif training data set; the multi-source PAR and PAR dif data are input into the random forest decision tree for training, and PAR and PAR are obtained The dif random forest decision tree predicts PAR and PAR dif in the time and space to be measured; the scattered radiation ratio at the time to be measured and the coordinates to be measured is calculated through the predicted PAR and PAR dif to solve the problem of PAR and PAR provided by existing satellite data. The low accuracy of PAR dif causes the problem of low accuracy of K d .
步骤1.获取数据Step 1. Get data
1)站点实测数据集1) Site measured data set
本发明获取的站点实测数据为国际通量观测研究网络数据集(FLUXNET),如直接预测Kd的实施例中所述。在本实施例中使用的是FLUXNET2015产品中42个站点在2000~2010年的PAR(μmol m-2 s-1)和PARdif(μmol m-2 s-1)日数据。其中PAR是光合有效辐射,PARdif为光合有效辐射中的散射辐射。The site measured data obtained by this invention is the International Flux Observation Research Network Data Set (FLUXNET), as described in the embodiment of directly predicting K d . In this example, the daily data of PAR (μmol m -2 s -1 ) and PAR dif (μmol m -2 s -1 ) of 42 sites in the FLUXNET2015 product from 2000 to 2010 are used. Among them, PAR is photosynthetically active radiation, and PAR dif is the scattered radiation in photosynthetically active radiation.
2)卫星数据集2)Satellite data set
本发明获取的卫星数据为CERES数据集和BESS数据集中至少一个卫星数据集,具体如直接预测Kd的实施例中所述。本实施例中使用的是CERES产品数据集中的PARdif(W m-2)和计算得到的PAR(W m-2),BESS_Rad中2000~2010年分辨率为5km的PAR(mol m-2 d-1)和PARdif(mol m-2 d-1)。The satellite data acquired by the present invention is at least one satellite data set in the CERES data set and the BESS data set, specifically as described in the embodiment of directly predicting K d . In this example, the PAR dif (W m -2 ) and the calculated PAR (W m -2 ) in the CERES product data set are used. The PAR (mol m -2 d) with a resolution of 5 km in BESS_Rad from 2000 to 2010 is used. -1 ) and PAR dif (mol m -2 d -1 ).
3)再分析数据集3) Reanalyze the data set
本发明获取的再分析数据为MERRA2再分析数据集,具体如直接预测Kd的实施例中所述。本实施例中使用的是MERRA2再分析数据集中的PARdif(W m-2)和PAR(W m-2)数据。The reanalysis data obtained by the present invention is the MERRA2 reanalysis data set, specifically as described in the embodiment of directly predicting K d . In this embodiment, the PAR dif (W m -2 ) and PAR (W m -2 ) data in the MERRA2 reanalysis data set are used.
4)角度数据4)Angle data
本发明获取的角度数据包括获取时间、经纬度、太阳高度、太阳赤纬角,具体如直接预测Kd的实施例中所述,通过获取的上述数据计算太阳高度角的余弦,用来表征时间与经纬度。The angle data acquired by the present invention includes acquisition time, longitude and latitude, solar altitude, and solar declination angle. Specifically, as described in the embodiment of directly predicting K d , the cosine of the solar altitude angle is calculated through the above acquired data to represent the time and Latitude and longitude.
步骤2.数据处理Step 2. Data processing
为了保证数据选取的准确性及后续模型构建的准确性,本发明需先提取与站点数据时空相匹配的卫星数据和再分析数据,得到时空匹配后的多源PAR和PARdif数据,还需要统一所有站点数据集、卫星数据集和再分析数据集中PAR和PARdif的坐标。进一步为了保证数据选取的准确性还需要对时空匹配后的数据进行质量筛选,此外本发明还通过获取的角度数据将时间和经纬度这三条特征用太阳天顶角余弦来表述。其中卫星数据与站点数据时空匹配、单位换算、角度数据处理、数据筛选,具体方式如直接预测Kd的实施例中所述。最终本发明使用39个站点的PARdif和PAR数据和对应预处理后的卫星PARdif和PAR数据、再分析PARdif和PAR数据。In order to ensure the accuracy of data selection and subsequent model construction, this invention needs to first extract satellite data and re-analysis data that match the site data in space and time, and obtain multi-source PAR and PAR dif data after spatio-temporal matching. It also needs to be unified Coordinates of PAR and PAR dif in all site datasets, satellite datasets and reanalysis datasets. Furthermore, in order to ensure the accuracy of data selection, it is necessary to quality screen the data after spatio-temporal matching. In addition, the present invention also uses the obtained angle data to express the three characteristics of time and longitude and latitude as the cosine of the solar zenith angle. The specific methods include spatio-temporal matching of satellite data and site data, unit conversion, angle data processing, and data filtering, as described in the embodiment of directly predicting K d . Finally, the present invention uses the PAR dif and PAR data of 39 sites and the corresponding preprocessed satellite PAR dif and PAR data, and then analyzes the PAR dif and PAR data.
步骤3.构建随机森林决策树并进行训练Step 3. Build a random forest decision tree and train it
根据步骤2所获得数据建立相应数据集,将FLUXNET站点的PARdif和PAR数据、卫星数据集中至少一种数据集的PARdif和PAR数据、MERRA2再分析数据集中的PARdif和PAR数据以及与三者统一对应的太阳天顶角余弦作为训练集,输入RFM模型中实现决策树训练,生成最终的PARdif和PAR随机森林决策树,继而可以预测某一区域或全球的PARdif和PAR。其中模型训练过程中,采用K折交叉验证法进行训练。Establish corresponding data sets based on the data obtained in step 2, and combine the PAR dif and PAR data of the FLUXNET site, the PAR dif and PAR data of at least one data set in the satellite data set, the PAR dif and PAR data of the MERRA2 reanalysis data set, and the three The cosine of the solar zenith angle corresponding to each other is used as a training set, and is input into the RFM model to implement decision tree training to generate the final PAR dif and PAR random forest decision tree, which can then predict PAR dif and PAR in a certain region or globally. During the model training process, the K-fold cross-validation method is used for training.
步骤4.预测散射辐射比Step 4. Predict the scattered radiation ratio
根据步骤3预测得到的PARdif和PAR,通过预测生成的PARdif和PAR根据公式(6)计算得到散射辐射比Kd。According to the PAR dif and PAR predicted in step 3, the scattered radiation ratio K d is calculated according to the predicted PAR dif and PAR according to formula (6).
实验2:Experiment 2:
本发明通过构建PARdif和PAR的RFM模型,预测生成不同时空下的PAR和PARdif,再通过计算间接得到对应时空下的散射辐射辐射比。为了验证该方法间接预测散射辐射比的实用性,实验将间接求得的Kd按照站点IGBP标准分类来观测分析间接预测结果,得到如图3所示不同植被类型下的Kd分布结果图。图4中,星标为Kd均值,Y轴左侧为IGBP标准分类的植被类型,阴影部分为Kd的集中区间(0.4-0.6)。根据IGBP分类标准得到FLUXNET中266个站点的植被类型主要包括((落叶阔叶林,DBF=27),(农田,CRO=29),(常绿针叶林,ENF=55),(常绿阔叶林,EBF=18),(草地,GRA=46),(热带稀疏草原,SAV=9),(疏林地,WSA=7),(混合林,MF=9),(稀疏灌丛,OSH=14),(湿地,WET=44),(其他,OTHERS=8))十一种。不同植被类型对PAR的吸收分布差异巨大,随着树木的增加Kd的均值呈现上升趋势,由SAV和WSA与ENF、DBF、EBF、MF对比可发现均值增加25.6%±3.65%。在全球范围中,FLUXNET站点的平均Kd为0.4964,总体范围为0.4-0.6(图4阴影部分),这与实际的研究结果具有良好的一致性,证明了采用该方法间接预测散射辐射比的可靠性,可以应用于后续某一区域或全球的散射辐射比研究。The present invention predicts and generates PAR and PAR dif in different time and space by constructing the RFM model of PAR dif and PAR, and then indirectly obtains the scattered radiation radiation ratio in the corresponding time and space through calculation. In order to verify the practicality of this method for indirectly predicting the scattered radiation ratio, the experiment will observe and analyze the indirect prediction results based on the K d obtained indirectly according to the site IGBP standard classification, and obtain the K d distribution results under different vegetation types as shown in Figure 3. In Figure 4, the star mark is the mean value of K d , the left side of the Y-axis is the vegetation type classified by the IGBP standard, and the shaded part is the concentration interval of K d (0.4-0.6). According to the IGBP classification standard, the vegetation types of the 266 sites in FLUXNET mainly include ((Deciduous broad-leaved forest, DBF=27), (Farmland, CRO=29), (Evergreen coniferous forest, ENF=55), (Evergreen Broadleaf forest, EBF=18), (Grassland, GRA=46), (Savanna, SAV=9), (Sparse woodland, WSA=7), (Mixed forest, MF=9), (Sparse shrubland, OSH=14), (Wetland, WET=44), (Others, OTHERS=8)) eleven types. The absorption distribution of PAR by different vegetation types varies greatly. As the number of trees increases, the mean value of K d shows an upward trend. Comparing SAV and WSA with ENF, DBF, EBF, and MF, it can be found that the mean value increases by 25.6% ± 3.65%. Globally, the average K d of FLUXNET sites is 0.4964, with an overall range of 0.4-0.6 (shaded part in Figure 4), which is in good agreement with the actual research results and proves that this method is used to indirectly predict the scattered radiation ratio. Reliable, it can be applied to subsequent research on scattered radiation ratio in a certain region or globally.
通过上述两种方法(直接预测和间接预测)分别预测生成Kd,对比直接预测(实验1)和间接预测(实验2)得到的实验结果,得到如图5所示的直接预测的Kd与PAR的散点图,如图6所示的间接预测的Kd与PAR的散点图。在图5、6中,斜线为Kd的线性回归,阴影部分为Kd的分布区域,可以看出实验1和实验2预测得到的Kd都集中在0.4-0.6之间,同时随着RAR增加,Kd会发生缓慢下降,实验1和实验2得到的趋势相同,实验1预测生成的Kd Mean为0.5125,实验2预测生成的Kd Mean为0.4907,二者仅相差0.02,证明无论是直接预测还是间接预测,这两种方式得到的结果相差不大,都可应用于后续散射辐射比的研究。The above two methods (direct prediction and indirect prediction) are used to predict and generate K d respectively. Comparing the experimental results obtained by direct prediction (Experiment 1) and indirect prediction (Experiment 2), the directly predicted K d and K d are obtained as shown in Figure 5. Scatter plot of PAR, Figure 6 shows a scatter plot of indirectly predicted K d versus PAR. In Figures 5 and 6, the sloped line is the linear regression of K d , and the shaded part is the distribution area of K d . It can be seen that the K d predicted by Experiment 1 and Experiment 2 are concentrated between 0.4-0.6, and with the As RAR increases, K d will slowly decrease. The trends obtained in Experiment 1 and Experiment 2 are the same. The K d Mean predicted by Experiment 1 is 0.5125, and the K d Mean predicted by Experiment 2 is 0.4907. The difference between the two is only 0.02, proving that regardless of Whether it is direct prediction or indirect prediction, the results obtained by these two methods are not much different, and both can be applied to the subsequent study of scattered radiation ratio.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210010803.XA CN114357885B (en) | 2022-01-06 | 2022-01-06 | Photosynthetic effective radiation scattering proportion prediction method fusing multisource data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210010803.XA CN114357885B (en) | 2022-01-06 | 2022-01-06 | Photosynthetic effective radiation scattering proportion prediction method fusing multisource data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114357885A CN114357885A (en) | 2022-04-15 |
| CN114357885B true CN114357885B (en) | 2024-03-01 |
Family
ID=81107179
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210010803.XA Active CN114357885B (en) | 2022-01-06 | 2022-01-06 | Photosynthetic effective radiation scattering proportion prediction method fusing multisource data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114357885B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114966892B (en) * | 2022-05-06 | 2023-09-29 | 中国气象局气象探测中心 | Star-to-ground total radiation observation data matching and evaluation methods and systems, media and equipment |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012109273A2 (en) * | 2011-02-08 | 2012-08-16 | Rapiscan Systems, Inc. | Covert surveillance using multi-modality sensing |
| CN109213964A (en) * | 2018-07-13 | 2019-01-15 | 中南大学 | A kind of satellite AOD product bearing calibration for merging multi-source feature geographic factor |
| CN112162979A (en) * | 2019-11-14 | 2021-01-01 | 湖南国天电子科技有限公司 | Ocean sediment test system and method based on deep learning |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11408978B2 (en) * | 2015-07-17 | 2022-08-09 | Origin Wireless, Inc. | Method, apparatus, and system for vital signs monitoring using high frequency wireless signals |
| CN110097611B (en) * | 2019-04-28 | 2023-09-22 | 上海联影智能医疗科技有限公司 | Image reconstruction method, device, equipment and storage medium |
-
2022
- 2022-01-06 CN CN202210010803.XA patent/CN114357885B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012109273A2 (en) * | 2011-02-08 | 2012-08-16 | Rapiscan Systems, Inc. | Covert surveillance using multi-modality sensing |
| CN109213964A (en) * | 2018-07-13 | 2019-01-15 | 中南大学 | A kind of satellite AOD product bearing calibration for merging multi-source feature geographic factor |
| CN112162979A (en) * | 2019-11-14 | 2021-01-01 | 湖南国天电子科技有限公司 | Ocean sediment test system and method based on deep learning |
Non-Patent Citations (3)
| Title |
|---|
| 多源地面短波辐射数据融合与评估;刘军建;师春香;韩帅;姜志伟;张涛;;遥感技术与应用;20181020(05);全文 * |
| 干旱区黑碳沉降积雪反照率及雪粒径模拟研究;曹肖奕;丁建丽;陈文倩;王鑫;崔杰粲;张喆;;中国环境科学;20200620(06);全文 * |
| 田野 ; 郭子祺 ; 乔彦超 ; 雷霞 ; 谢飞 ; .基于遥感的官厅水库水质监测研究.生态学报.(07),全文. * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114357885A (en) | 2022-04-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Running et al. | Terrestrial remote sensing science and algorithms planned for EOS/MODIS | |
| CN106372730B (en) | Utilize the vegetation net primary productivity remote sensing estimation method of machine learning | |
| Wu et al. | Evaluation of microphysics schemes in tropical cyclones using polarimetric radar observations: Convective precipitation in an outer rainband | |
| CN114005048A (en) | Multi-temporal data-based land cover change and thermal environment influence research method | |
| Jiménez et al. | Exploring the merging of the global land evaporation WACMOS-ET products based on local tower measurements | |
| Prieto-Blanco et al. | Satellite-driven modelling of net primary productivity (NPP): Theoretical analysis | |
| Liang et al. | Land surface observation, modeling and data assimilation | |
| Zhu et al. | Use of a BP neural network and meteorological data for generating spatiotemporally continuous LAI time series | |
| Wei et al. | Mapping super high resolution evapotranspiration in oasis-desert areas using UAV multi-sensor data | |
| CN113297904A (en) | Alpine grassland biomass estimation method and system based on satellite driving model | |
| Zhou et al. | Remote sensing of regional-scale maize lodging using multitemporal GF-1 images | |
| Mazorra-Aguiar et al. | Solar radiation forecasting with statistical models | |
| CN114357885B (en) | Photosynthetic effective radiation scattering proportion prediction method fusing multisource data | |
| Song et al. | Research on Desertification Monitoring and Vegetation Refinement Extraction Methods Based on the Synergy of Multi-Source Remote Sensing Imagery | |
| Zhou et al. | Comparison of inversion method of maize leaf area index based on UAV hyperspectral remote sensing | |
| Zhang et al. | Estimating 250-m land surface and atmospheric variables from MERSI top-of-atmosphere reflectance | |
| CN120045880A (en) | Method for analyzing climate change rule based on remote sensing data | |
| Chen et al. | High-spatiotemporal-resolution estimation of solar energy component in the United States using a new satellite-based model | |
| Running et al. | Land ecosystems and hydrology | |
| Wu et al. | A method for retrieving maize fractional vegetation cover by combining 3-D radiative transfer model and transfer learning | |
| Xiaozhi et al. | Evaluation of wildfire occurrence along high voltage power line by remote sensing data: A case study in Xianning, Hubei, China | |
| Wang et al. | Integrating forest inventory and LiDAR observations to uncover the role of plant traits on cooling and humidifying effects in urban area | |
| Goroshi et al. | Assessment of net primary productivity over India using Indian geostationary satellite (INSAT-3A) data | |
| Wang et al. | Estimating vegetation productivity of urban regions using sun-induced chlorophyll fluorescence data derived from the OCO-2 satellite | |
| Wenyu et al. | Global-scale improvement of terrestrial gross primary productivity estimation by integrating optical remote sensing with meteorological data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |