CN105631203A

CN105631203A - Methods to Identify Sources of Heavy Metal Pollution in Soil

Info

Publication number: CN105631203A
Application number: CN201510988451.5A
Authority: CN
Inventors: 陈锋; 张云峰; 曹张伟; 戈源运; 刘晓立; 王红梅
Original assignee: North China Institute of Aerospace Engineering
Current assignee: North China Institute of Aerospace Engineering
Priority date: 2015-12-27
Filing date: 2015-12-27
Publication date: 2016-06-01

Abstract

The invention discloses a method for identifying heavy metal pollution sources in soil based on a K-means clustering method. The method uses a K-means clustering-principal component analysis pollution source identification composite model to identify pollution sources, and can quickly and accurately trace the origin of heavy metal pollutants source, giving the contribution rate of each pollutant, providing reliable technical support for environmental management departments to deal with pollution accidents and controlling pollution risks, and overcoming the inability of existing technologies to give the contribution of specific emission sources to receptors and the impact on pollution control. The defect that the work has no practical guiding significance.

Description

Methods to Identify Sources of Heavy Metal Pollution in Soil

技术领域technical field

本发明属于重金属污染源解析技术领域，具体涉及一种基于K-means聚类分析方法对土壤中重金属来源进行识别的方法。The invention belongs to the technical field of heavy metal pollution source analysis, and in particular relates to a method for identifying heavy metal sources in soil based on a K-means cluster analysis method.

背景技术Background technique

污染源识别技术是对污染物的来源进行判别、解析与评价的一种方法。当前的污染源识别技术大体可以分为三种：清单分析法、扩散模型和受体模型。清单分析法是通过观测和模拟污染物的源排放量、排放特征及排放地理分布等，建立列表模型的一种源解析方法；扩散模型属于预测式模型，它是通过输入各个污染源的排放数据和相关参数信息来预测污染物的时空变化情况；受体模型则通过对受体样品的化学和显微分析，确定各污染源贡献率的一类技术，其最终目的是识别对受体有贡献的污染源，并且定量计算各污染源的分担率。Pollution source identification technology is a method to identify, analyze and evaluate the source of pollutants. The current pollution source identification technology can be roughly divided into three types: inventory analysis method, diffusion model and receptor model. The inventory analysis method is a source apportionment method to establish a list model by observing and simulating the source emissions, emission characteristics, and geographical distribution of pollutants; the diffusion model is a predictive model, which is input by inputting the emission data of each pollution source and Relevant parameter information is used to predict the spatial and temporal changes of pollutants; the receptor model is a type of technology that determines the contribution rate of each pollution source through chemical and microscopic analysis of the receptor sample, and its ultimate goal is to identify the pollution source that contributes to the receptor , and quantitatively calculate the share rate of each pollution source.

当前针对土壤中重金属污染源识别的研究很少，主要的污染源识别方法就是通过对源谱和因子荷载的图形观察实现定性比较，或通过计算源谱和因子荷载的偏差实现半定量比较。这些方法多没有考虑污染源谱的非线性特征，识别结果不能真实反映因子荷载与污染源谱的对应关系。At present, there are few studies on the identification of heavy metal pollution sources in soil. The main pollution source identification method is to achieve qualitative comparison through graphic observation of source spectrum and factor loading, or to achieve semi-quantitative comparison by calculating the deviation between source spectrum and factor loading. Most of these methods do not consider the nonlinear characteristics of pollution source spectrum, and the identification results cannot truly reflect the corresponding relationship between factor loads and pollution source spectrum.

发明内容Contents of the invention

本发明是为了提供一种基于K-means聚类分析方法对土壤中重金属来源进行识别的方法，克服了传统重金属污染源解析方法不能给出具体排放源贡献率大小的缺陷。The present invention aims to provide a method for identifying the source of heavy metals in soil based on the K-means cluster analysis method, which overcomes the defect that the traditional heavy metal pollution source analysis method cannot give the contribution rate of specific emission sources.

发明人提供了以下技术方案。The inventor provides the following technical solutions.

一种识别土壤中重金属污染源的方法，操作步骤包括：A method for identifying heavy metal pollution sources in soil, the operation steps comprising:

步骤一，确定重金属污染源的调查区域；Step 1, determine the investigation area of heavy metal pollution sources;

可以结合城市总体规划以及工业产业布局，选择重金属排放污染源类型超过3类、排放的重金属类型至少包括铅、汞、铬、砷、镉这5类主要重金属污染物的区域作为调查区域。Combined with the overall urban planning and industrial layout, select areas with more than three types of heavy metal emission pollution sources and at least five types of heavy metal pollutants discharged including lead, mercury, chromium, arsenic, and cadmium as the investigation area.

步骤二，在确定的重金属污染源调查区域内进行调查，调查过程包括：Step 2: Conduct investigation in the determined investigation area of heavy metal pollution sources. The investigation process includes:

（1）基础资料收集(1) Collection of basic data

通过对相关资料（如群众投诉、污染源普查数据库、污染源档案、环境监测资料、环评报告等）的收集和整理分析，掌握调查区域重金属污染行业企业分布，从中筛选出具有代表性、影响较为突出的行业企业，确定要进一步调查的污染源名单；Through the collection and analysis of relevant materials (such as public complaints, pollution source census database, pollution source files, environmental monitoring data, environmental impact assessment reports, etc.), grasp the distribution of heavy metal pollution industries and enterprises in the survey area, and select representative ones with more prominent influence. Industry enterprises, to determine the list of pollution sources to be further investigated;

（2）实地调查（监测）(2) Field investigation (monitoring)

对调查区域内的主要重金属污染物进行实地调查（包括现场布点、采样和分析测试）。根据污染源的生产工艺、生产流程、污染物的产生机制以及排放形式等因素，参照污染源调查规范，确定布点和采样方法。监测指标包括成分浓度指标。Carry out on-the-spot investigation of major heavy metal pollutants in the survey area (including on-site layout, sampling and analysis and testing). According to the production process, production process, pollutant generation mechanism and discharge form of pollution sources and other factors, and refer to the pollution source investigation specifications, determine the distribution and sampling methods. Monitoring indicators include component concentration indicators.

（3）数据处理与分析(3) Data processing and analysis

对实地调查获得的数据，结合数据自身特征和调查目的，采用科学的统计方法进行分类整理和统计分析。For the data obtained from the field survey, combined with the characteristics of the data itself and the purpose of the survey, scientific statistical methods are used for classification and statistical analysis.

（4）将处理分析后的数据建立污染源信息数据库。(4) The processed and analyzed data will establish a pollution source information database.

步骤三，在重金属污染源调查区域调查的基础上，分析不同情况下污染源对环境的影响；Step 3. Based on the regional investigation of heavy metal pollution sources, analyze the impact of pollution sources on the environment under different circumstances;

不同情况包括：①单个污染源位于环境敏感点；②多个不同类型的污染源位于环境敏感点；③多个相同类型的污染源位于环境敏感点。第①种情况下，根据污染源与环境敏感点的相对位置关系，制定相应的监测方案，分析污染源对环境敏感点的影响程度；第②种情况下，根据各污染源的特征污染物质进行分析判别；第③种情况比较复杂，需对污染源的源强进行测试，并结合数学模型判定各污染源的影响大小。Different situations include: ① a single pollution source is located in an environmentally sensitive point; ② multiple pollution sources of different types are located in an environmentally sensitive point; ③ multiple pollution sources of the same type are located in an environmentally sensitive point. In the first case, according to the relative positional relationship between the pollution source and the environmental sensitive point, formulate the corresponding monitoring plan, and analyze the influence degree of the pollution source on the environmental sensitive point; in the second case, analyze and judge according to the characteristic pollutants of each pollution source; The third case is more complicated. It is necessary to test the source strength of the pollution source and determine the influence of each pollution source in combination with a mathematical model.

步骤四，识别各类排放源中的重金属特征标识物；Step 4, identify the heavy metal characteristic markers in various emission sources;

根据实地调查获得的主要重金属污染物的量值，综合考虑目标污染中各重金属组分的含量即客观指标，根据其源成分谱确定其污染类型。According to the quantity and value of the main heavy metal pollutants obtained from the field survey, the content of each heavy metal component in the target pollution is considered comprehensively, that is, the objective index, and the pollution type is determined according to its source component spectrum.

各类排放源包括：工业“三废”、汽车尾气、城市生活垃圾、污泥农用、有机肥、农药化肥。Various emission sources include: industrial "three wastes", vehicle exhaust, municipal solid waste, agricultural sludge, organic fertilizers, pesticides and chemical fertilizers.

步骤五，应用K-means聚类方法，采用Matlab软件编程，将实地调查中现场布点及监测数据转化为计算机能够接受的数量化矩阵，对数据进行标准化处理，消除量纲影响，得到标准化矩阵；Step 5: Apply the K-means clustering method and use Matlab software programming to convert the site layout and monitoring data in the field survey into a quantitative matrix that can be accepted by the computer, standardize the data, eliminate the influence of dimensions, and obtain a standardized matrix;

步骤六，构建基于K-means聚类方法的重金属污染源识别的模型，包括Step six, build a model for identifying heavy metal pollution sources based on the K-means clustering method, including

（一）、K-means聚类分析方法进行污染源的类别划分(1) Classification of pollution sources by K-means cluster analysis method

第一步，预处理及初始化The first step, preprocessing and initialization

第二步，输出训练样本对The second step is to output training sample pairs

K-means算法的核心思想是把n个数据对象划分为k个聚类,使每个聚类中的数据点到该聚类中心的平方和最小，算法处理过程：The core idea of the K-means algorithm is to divide n data objects into k clusters, so that the sum of squares from the data points in each cluster to the cluster center is the smallest. The algorithm processing process:

输入：聚类个数k，包含n个数据对象的数据集。Input: the number of clusters k, a data set containing n data objects.

输出：k个聚类。Output: k clusters.

（1）任选n个数据对象中的k个对象作为初始聚类中心(1) Choose k objects among n data objects as the initial clustering center

（2）对剩余的每个对象，根据其与各个簇中心的距离，将它赋给最近的簇。(2) For each remaining object, assign it to the nearest cluster according to its distance from the center of each cluster.

（3）利用公式(i=1,2,…,n;j=1,2,…,k)重新计算每个类中心，并用公式计算出此时的准则函数值 (3) Using the formula (i=1,2,…,n;j=1,2,…,k) recalculate each class center and use the formula Calculate the criterion function value at this time

（4）计算新的分配方式：假设在类n中，如果（其中），将样本分配到类m中，然后计算此时分配后的准则函数值 (4) Calculate the new allocation method: Assume In class n, if (in ), the sample Assign to class m, and then calculate the value of the criterion function after assignment at this time

（5）如果停止计算，否则c=c+1，重复（3）（4）（5）步(5) if Stop calculation, otherwise c=c+1, repeat (3) (4) (5) steps

对处理大数据集，K-means算法是相对可伸缩的和高效率的，n是所有对象的数目，k是簇的数目，t是迭代的次数。通常k〈〈t且t〈〈n。用K-means算法来聚类时，当结果簇是密集的，而簇与簇之间区别明显时，它的聚类效果较好。For processing large data sets, the K-means algorithm is relatively scalable and efficient, where n is the number of all objects, k is the number of clusters, and t is the number of iterations. Usually k<<t and t<<n. When the K-means algorithm is used for clustering, when the resulting clusters are dense and the differences between clusters are obvious, its clustering effect is better.

（二）、引用主成分分析进行污染源各类别的贡献率计算(2) Using principal component analysis to calculate the contribution rate of each category of pollution sources

第一步，数据标准化处理The first step, data standardization

包括数据的审核、污染物变量的选择和受体浓度数据标准化三个过程。Including data review, selection of pollutant variables and standardization of receptor concentration data.

数据的审核：包括未检出项，缺失项，异常值的识别、判断和处理。Data review: including identification, judgment and processing of undetected items, missing items, and outliers.

引入信噪比，如果某污染物信噪比过小或者低于检出限的比例较大，则不能用于进行因子分析。The signal-to-noise ratio is introduced. If the signal-to-noise ratio of a certain pollutant is too small or the proportion below the detection limit is large, it cannot be used for factor analysis.

数据标准化： Data normalization:

其中：；(j=1,2,…,p)in: ; (j=1,2,...,p)

第二步，计算样本的相关系数矩阵The second step is to calculate the correlation coefficient matrix of the sample

其中， in,

第三步，计算相关系数矩阵的特征值和相应的特征向量The third step is to calculate the eigenvalues and corresponding eigenvectors of the correlation coefficient matrix

特征值： Eigenvalues:

特征向量： Feature vector:

步骤七，利用构建的重金属污染源识别模型进行重金属污染源的识别，包括：Step seven, use the constructed heavy metal pollution source identification model to identify heavy metal pollution sources, including:

（1）利用主成分因子分析提取出具有因子荷载矩阵和因子得分矩阵，确定主成分因子个数。(1) Use the principal component factor analysis to extract the factor loading matrix and factor score matrix, and determine the number of principal component factors.

（2）再把基于污染源成分谱的因子荷载识别当成多参数模式的识别问题，利用K-mean聚类分析进行污染源的识别。(2) Consider the identification of factor loads based on the component spectrum of pollution sources as the identification of multi-parameter patterns, and use K-mean cluster analysis to identify pollution sources.

（3）最后利用识别好的分类模型实现因子荷载的污染源贡献率的计算，实现重金属特征污染物的源解析。(3) Finally, the identified classification model is used to calculate the contribution rate of pollution sources of factor loads, and realize the source apportionment of heavy metal characteristic pollutants.

本发明所述的重金属污染物质选择遵循以下原则：（1）国内外法规、标准中限制排放的重金属物质；（2）广泛存在于各类污染源中，或者是行业的特征污染物；（3）具有可靠监测方法的重金属物质。根据以上原则，筛选出5种对人体危害较大的重金属污染物，即铅、汞、铬、砷、镉。The selection of heavy metal pollutants described in the present invention follows the following principles: (1) Heavy metal substances that are restricted to be discharged in domestic and foreign regulations and standards; (2) Exist widely in various pollution sources, or are characteristic pollutants of the industry; (3) Heavy metal substances with reliable monitoring methods. According to the above principles, five kinds of heavy metal pollutants that are more harmful to the human body are screened out, namely lead, mercury, chromium, arsenic, and cadmium.

本发明的优点为：The advantages of the present invention are:

（1）该方法能够快速、准确的追溯重金属污染物的来源，实用性强，有广泛的推广应用价值，为环境管理部门应对污染事故、控制污染风险提供了可靠的技术保障。(1) This method can quickly and accurately trace the source of heavy metal pollutants. It has strong practicability and wide application value. It provides a reliable technical guarantee for environmental management departments to deal with pollution accidents and control pollution risks.

（2）传统的污染物源解析技术只能大致给出对环境受体贡献较大的污染源类别，而不能给出具体排放源对受体贡献的大小，缺乏对污染防治工作的实际指导意义。本发明所述方法，全面揭示了重金属源排放组成特征，筛选出能够指示污染来源的特征标识物。(2) The traditional pollutant source apportionment technology can only roughly give the category of pollution sources that contribute more to the environmental receptors, but cannot give the contribution of specific emission sources to the receptors, and lacks practical guidance for pollution prevention and control. The method of the invention comprehensively reveals the emission composition characteristics of heavy metal sources, and screens out characteristic markers that can indicate pollution sources.

（3）本发明为制定区域污染控制对策及区域环境质量改善提供技术支持，使今后环境管理部门面对污染问题时，可以通过系统、完整的源解析方法和相应的数据信息系统，迅速识别污染源，从而进行污染的防控。(3) The present invention provides technical support for formulating regional pollution control strategies and improving regional environmental quality, so that when environmental management departments face pollution problems in the future, they can quickly identify pollution sources through systematic and complete source analysis methods and corresponding data information systems , so as to prevent and control pollution.

附图说明Description of drawings

图1为本发明所述方法的流程图。Figure 1 is a flow chart of the method of the present invention.

具体实施方式detailed description

下面结合具体实施例对本发明所述内容作进一步详细的说明。The content of the present invention will be described in further detail below in conjunction with specific embodiments.

实施例Example

步骤一，以晋江流域作为重金属污染调查区域。Step 1. The Jinjiang River Basin is used as the investigation area for heavy metal pollution.

步骤二，数据来源于晋江流域沉积物中的重金属含量，用抓斗式采样器采集了10个站位表层沉积物样品。Step 2, the data comes from the heavy metal content in the sediments of the Jinjiang River Basin, and the surface sediment samples of 10 stations were collected with a grab sampler.

步骤三，根据调查区域，确定调查的重金属为As、Hg、Cd、Cr、Pb，共取10个监测点位。Step 3: According to the survey area, determine the heavy metals to be investigated as As, Hg, Cd, Cr, and Pb, and take a total of 10 monitoring points.

步骤四～七见下列数据分析，其中数据标准化处理和聚类分析情况等借助SPSS统计分析软件和Matlab编辑的程序完成，主要分析情况见下：Steps 4 to 7 refer to the following data analysis, in which data standardization processing and clustering analysis are completed with the help of SPSS statistical analysis software and programs edited by Matlab. The main analysis is as follows:

本次数据分析主要对As、Hg、Cd、Cr、Pb进行K-means聚类分析，结果如下：This data analysis mainly carried out K-means cluster analysis on As, Hg, Cd, Cr, and Pb, and the results are as follows:

表1和表2分别为初始分类中心和最终分类中心，实际上为2种分类（此2类是通过spss统计分析软件分出来的，1指的是第一类，2指的是第二类）标准的浓度。Table 1 and Table 2 are the initial classification center and final classification center respectively, which are actually 2 types of classification (these 2 types are separated by spss statistical analysis software, 1 refers to the first type, 2 refers to the second type ) standard concentration.

表1初始分类中心Table 1 Initial Classification Center

表2最终分类中心Table 2 Final classification center

表3为方差分析表，分析各聚类变量是否有统计学意义，从表中可以看出，本例的2个聚类变量所对应的p值（Sig.）很小，可以判定此2个变量对本案例数据分类有意义。Table 3 is the variance analysis table to analyze whether each clustering variable is statistically significant. It can be seen from the table that the p-value (Sig.) corresponding to the two clustering variables in this example is very small, and the two clustering variables can be determined Variables are meaningful for the classification of the data in this case.

表3方差分析表Table 3 variance analysis table

K-means聚类将本例的重金属污染源分为2大类。K-means clustering divides the heavy metal pollution sources in this example into two categories.

本次数据分析主要对As、Hg、Cd、Cr、Pb进行主成分分析，结果如表4：相关系数及相应P值：This data analysis mainly conducts principal component analysis on As, Hg, Cd, Cr, and Pb, and the results are shown in Table 4: Correlation coefficient and corresponding P value:

表4相关系数矩阵Table 4 Correlation coefficient matrix

见表5，主成分的统计信息包括特征根由大到小的次序排列，第一主成分特征根为3.576，它解释了总体的71.518%；虽然第二主成分特征根为0.701＜1但接近1，所以也选取进来，解释总体的14.016%，此时累计贡献率达85.535%，本例宜选取前两个主成分。See Table 5. The statistical information of the principal components includes the order of the characteristic roots from large to small. The characteristic root of the first principal component is 3.576, which explains 71.518% of the total; although the characteristic root of the second principal component is 0.701<1, it is close to 1 , so it is also selected to explain the overall 14.016%, and the cumulative contribution rate reaches 85.535%. In this example, the first two principal components should be selected.

表5总方差解释Table 5 Total Variance Explanation

见表6，主成分个数确定为2，则再对数据进行分析在选取主成分时输入2，得到该因子负荷矩阵（即因子载荷矩阵）。可见第一主成分主要包含原变量As、Cd、Pb信息，即表明第一主成分污染源主要为化肥农药和生活污水源。第二主成分包含了Hg的主要信息，即污染源主要为氯碱、塑料、电池、电子等工业废水。See Table 6, the number of principal components is determined to be 2, then analyze the data again and input 2 when selecting the principal components to obtain the factor loading matrix (ie factor loading matrix). It can be seen that the first principal component mainly contains the information of the original variables As, Cd, and Pb, which means that the pollution sources of the first principal component are mainly chemical fertilizers, pesticides and domestic sewage sources. The second principal component contains the main information of Hg, that is, the pollution sources are mainly industrial wastewater such as chlor-alkali, plastic, battery, and electronics.

表6因子载荷矩阵Table 6 Factor loading matrix

Claims

1. A method for identifying heavy metal pollution sources in soil, characterized in that the operating steps comprise:

Step 1, determine the investigation area of heavy metal pollution sources;

Step 2: Conduct investigation in the determined investigation area of heavy metal pollution sources. The investigation process includes:

(1) Collection of basic data

(2) Field investigation

Carry out on-the-spot investigation of major heavy metal pollutants in the investigation area, including on-site layout, sampling and analysis and testing;

(3) Data processing and analysis

Classify and organize and statistically analyze the data obtained from field surveys;

(4) Establish a pollution source information database with the processed and analyzed data;

Step 3. Based on the regional investigation of heavy metal pollution sources, analyze the impact of pollution sources on the environment under different circumstances;

Different situations include: ① a single pollution source is located in an environmentally sensitive point; ② multiple pollution sources of different types are located in an environmentally sensitive point; ③ multiple pollution sources of the same type are located in an environmentally sensitive point;

Step 4, identify the heavy metal characteristic markers in various emission sources;

Step 5: Apply the K-means clustering method and use Matlab software programming to convert the site layout and monitoring data in the field survey into a quantitative matrix that can be accepted by the computer, standardize the data, eliminate the influence of dimensions, and obtain a standardized matrix;

Step six, build a model for identifying heavy metal pollution sources based on the K-means clustering method, including

(1) Classification of pollution sources by K-means cluster analysis method

The first step, preprocessing and initialization

The second step is to output training sample pairs

The core idea of the K-means algorithm is to divide n data objects into k clusters, so that the sum of squares from the data points in each cluster to the cluster center is the smallest. The algorithm processing process:

Input: the number of clusters k, a data set containing n data objects;

Output: k clusters;

(1) Choose k objects among n data objects as the initial clustering center

(2) For each remaining object, assign it to the nearest cluster according to its distance from the center of each cluster;

(3) Using the formula (i=1,2,…,n;j=1,2,…,k) recalculate each class center and use the formula Calculate the criterion function value at this time

(4) Calculate the new allocation method: Assume In class n, if (in ), the sample Assign to class m, and then calculate the value of the criterion function after assignment at this time

(5) if Stop calculation, otherwise c=c+1, repeat (3) (4) (5) steps

For processing large data sets, the K-means algorithm is relatively scalable and efficient, n is the number of all objects, k is the number of clusters, and t is the number of iterations; usually k<<t and t<<n; When using the K-means algorithm to cluster, when the resulting clusters are dense and the differences between clusters are obvious, its clustering effect is better;

(2) Using principal component analysis to calculate the contribution rate of each category of pollution sources

The first step, data standardization

Including data review, selection of pollutant variables and standardization of receptor concentration data;

Data review: including identification, judgment and processing of undetected items, missing items, and outliers;

Introduce the signal-to-noise ratio. If the signal-to-noise ratio of a certain pollutant is too small or the proportion below the detection limit is large, it cannot be used for factor analysis;

Data normalization:

in: ; (j=1,2,...,p)

The second step is to calculate the correlation coefficient matrix of the sample

in,

The third step is to calculate the eigenvalues and corresponding eigenvectors of the correlation coefficient matrix

Eigenvalues:

Feature vector:

Step seven, use the constructed heavy metal pollution source identification model to identify heavy metal pollution sources, including:

(1) Use the principal component factor analysis to extract the factor loading matrix and factor score matrix, and determine the number of principal component factors;

(2) Consider the identification of factor loads based on the component spectrum of pollution sources as the identification of multi-parameter patterns, and use K-mean cluster analysis to identify pollution sources;

(3) Finally, the identified classification model is used to calculate the contribution rate of pollution sources of factor loads, and realize the source apportionment of heavy metal characteristic pollutants.

2. The method for identifying heavy metal pollution sources in soil according to claim 1, wherein the basic data in step 2 include public complaints, pollution source census databases, pollution source files, environmental monitoring data, and environmental impact assessment reports.

3. The method for identifying heavy metal pollution sources in soil according to claim 1, characterized in that, the various emission sources described in step 4 include: industrial "three wastes", automobile exhaust, municipal solid waste, agricultural sludge, organic fertilizer, Pesticides and fertilizers.