[go: up one dir, main page]

CN103258069B - A kind of Forecasting Methodology of steel industry electricity needs - Google Patents

A kind of Forecasting Methodology of steel industry electricity needs Download PDF

Info

Publication number
CN103258069B
CN103258069B CN201210504051.9A CN201210504051A CN103258069B CN 103258069 B CN103258069 B CN 103258069B CN 201210504051 A CN201210504051 A CN 201210504051A CN 103258069 B CN103258069 B CN 103258069B
Authority
CN
China
Prior art keywords
model
electricity consumption
test
regression
steel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210504051.9A
Other languages
Chinese (zh)
Other versions
CN103258069A (en
Inventor
张维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Economic and Technological Research Institute of State Grid Hubei Electric Power Co Ltd
Original Assignee
WUHAN CENTRAL CHINA POWER GRID CO Ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN CENTRAL CHINA POWER GRID CO Ltd, State Grid Corp of China SGCC filed Critical WUHAN CENTRAL CHINA POWER GRID CO Ltd
Priority to CN201210504051.9A priority Critical patent/CN103258069B/en
Publication of CN103258069A publication Critical patent/CN103258069A/en
Application granted granted Critical
Publication of CN103258069B publication Critical patent/CN103258069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种钢铁行业电力需求的预测方法,及电力系统的负荷预测。该方法对钢铁行业用电的影响因素进行分类,采用定性和定量相结合的方法分析钢铁行业用电的关键影响因素,提出了钢铁行业用电的指标体系,并给出了需求预测模型,为电力企业准确预测相关行业电力需求进而预测全社会电力需求提供了一种科学、实用的技术方案。本发明的优点是:充实和完善了现有电力需求预测方法,对提高电力企业的电力需求预测水平具有重要意义,可以产生显著的经济效益和社会效益。The invention discloses a method for forecasting electric power demand in the iron and steel industry and load forecasting of the electric power system. This method classifies the influencing factors of electricity consumption in the iron and steel industry, uses a combination of qualitative and quantitative methods to analyze the key influencing factors of electricity consumption in the iron and steel industry, proposes an index system for electricity consumption in the iron and steel industry, and gives a demand forecasting model, for It provides a scientific and practical technical solution for power companies to accurately predict the power demand of related industries and then predict the power demand of the whole society. The invention has the advantages that: it enriches and perfects the existing power demand forecasting method, is of great significance to improving the power demand forecasting level of electric power enterprises, and can produce remarkable economic and social benefits.

Description

一种钢铁行业电力需求的预测方法A Forecasting Method of Power Demand in Iron and Steel Industry

技术领域technical field

本发明涉及电力系统的负荷预测,尤其涉及钢铁行业电力需求的预测。The invention relates to the load forecasting of the power system, in particular to the forecasting of the power demand in the iron and steel industry.

背景技术Background technique

电力需求预测是电力系统中规划、计划、用电、调度等部门的基础工作,在电力工业市场化运行的过程中,电力需求预测又成为市场交易、市场营销等部门的核心业务之一。Power demand forecasting is the basic work of planning, planning, power consumption, dispatching and other departments in the power system. In the process of market-oriented operation of the power industry, power demand forecasting has become one of the core businesses of market transactions, marketing and other departments.

钢铁、有色金属、化工、非金属矿物制品四大高耗能行业用电量约占全社会用电量的1/3,而且高耗能行业用电具有大起大落的特点,其用电量的增减对用电总需求有着重要影响。高耗能行业用电具有以下特点:The electricity consumption of the four major energy-consuming industries of iron and steel, non-ferrous metals, chemicals, and non-metallic mineral products accounts for about 1/3 of the electricity consumption of the whole society, and the electricity consumption of high-energy-consuming industries has the characteristics of ups and downs. The reduction has an important impact on the total demand for electricity. The electricity consumption of high energy-consuming industries has the following characteristics:

(1)占全社会用电的比重大、对用电增长的影响大。以2010年为例,国家电网公司经营区域内钢铁、有色金属、化工和非金属矿物制品业四大高耗能行业用电占全社会用电的比重为32.1%,四大高耗能行业对全社会用电增长的贡献率为34%。这充分表明,高耗能用电对公司经营区域的用电形势有着重要的影响,预测高耗能行业用电对全社会用电预测具有重要意义。(1) It accounts for a large proportion of the electricity consumption of the whole society and has a great impact on the growth of electricity consumption. Taking 2010 as an example, the four major energy-intensive industries of iron and steel, non-ferrous metals, chemicals and non-metallic mineral products in the State Grid Corporation’s operating area accounted for 32.1% of the electricity consumption of the whole society. The contribution rate of electricity consumption growth in the whole society is 34%. This fully demonstrates that high-energy-consuming electricity consumption has an important impact on the electricity consumption situation in the company's operating area, and the prediction of high-energy-consuming industries' electricity consumption is of great significance to the electricity consumption forecast of the whole society.

(2)受宏观经济的影响大、波动幅度大、预测难度大。用电量增长受到宏观经济的很大影响,用电量变化与经济形势相关度较高。在2008年5月以前,高耗能行业用电的月度增速一般高于全社会用电增速。2008年5月以后,受金融危机的影响,全社会用电和高耗能用电都出现下滑,但高耗能行业用电受到的影响更大,降幅也更大。2009年经济见底回升之后,高耗能行业用电的回升势头又快于全社会用电。容易受经济形势影响而出现大起大落的特点,使得预测高耗能行业用电的难度加大,特别采用历史数据外推的预测方法很难得到理想的预测效果。可以说,只要能比较准确地预测高耗能行业用电量,电力需求预测就成功了一半。(2) It is greatly affected by the macro-economy, with large fluctuations and difficult forecasting. The growth of electricity consumption is greatly affected by the macro economy, and changes in electricity consumption are highly correlated with the economic situation. Before May 2008, the monthly growth rate of electricity consumption in high energy-consuming industries was generally higher than that of the whole society. After May 2008, affected by the financial crisis, electricity consumption in the whole society and high-energy-consuming electricity consumption both declined, but the electricity consumption in high-energy-consuming industries was more affected and the decline was even greater. After the economy bottomed out in 2009, the power consumption of high-energy-consuming industries picked up faster than that of the whole society. It is easy to be affected by the economic situation and have ups and downs, which makes it more difficult to predict the electricity consumption of high-energy-consuming industries. Especially, it is difficult to obtain ideal prediction results by using the prediction method of historical data extrapolation. It can be said that as long as the power consumption of high energy-consuming industries can be predicted more accurately, the power demand forecast is half successful.

(3)高耗能行业用电的变化趋势略为领先于全社会用电。以2008年公司经营区域用电量数据为例,高耗能行业用电量在5月份达到峰值,6月份之后开始逐月下滑。而全社会用电和工业用电在7月份才达到峰值,随后开始下降。高耗能行业用电的变化比全社会用电领先了2个月。(3) The changing trend of electricity consumption in high-energy-consuming industries is slightly ahead of that of the whole society. Taking the electricity consumption data of the company's operating areas in 2008 as an example, the electricity consumption of high-energy-consuming industries reached its peak in May, and began to decline month by month after June. The electricity consumption of the whole society and industry peaked only in July, and then began to decline. The change of electricity consumption in high energy-consuming industries is 2 months ahead of the electricity consumption of the whole society.

传统的电力需求预测技术可以分为两大类:一类是依据电力需求自身历史数据的预测技术,如外推法、时间序列法等。另一类是考虑与影响因素相关性的预测技术,如弹性系数法、产值单耗法、回归分析法等。但是这些预测技术都没有考虑高耗能行业用电的特点,无法反映高耗能行业用电变化的内在规律,分析预测效果也不理想。The traditional power demand forecasting technology can be divided into two categories: one is the forecasting technology based on the historical data of power demand itself, such as extrapolation method and time series method. The other type is the forecasting technology that considers the correlation with the influencing factors, such as the elastic coefficient method, the output value unit consumption method, and the regression analysis method. However, none of these forecasting techniques take into account the characteristics of electricity consumption in high-energy-consuming industries, and cannot reflect the inherent laws of changes in electricity consumption in high-energy-consuming industries, and the analysis and prediction results are not satisfactory.

国内外文献对高耗能行业用电特性有一些定性分析,但定量分析极少,尚没有可用的分析、预测模型。Domestic and foreign literatures have some qualitative analysis on the power consumption characteristics of high energy-consuming industries, but there are very few quantitative analysis, and there are no available analysis and prediction models.

当前,电力企业越来越认识到分析、预测高耗能行业用电对预测全社会用电的重要意义,但由于理论和方法的限制,目前电力企业对高耗能行业用电的分析、预测还存在不足。At present, power companies are increasingly aware of the importance of analyzing and predicting electricity consumption in high-energy-consuming industries to predict the electricity consumption of the whole society. There are still deficiencies.

(1)对高耗能行业用电影响因素的认识有待深入,没有形成统一、科学的分析指标体系。影响高耗能行业用电的因素很复杂,有些是直接因素,有些是间接因素。由于对影响因素的认识不深入,造成目前各单位对高耗能行业用电的分析角度存在很大差异,比如有的分析行业因素,有的分析宏观经济因素,有的分析产量,有的又分析价格。没有一套统一的分析指标,造成对高耗能行业用电的分析很难深入。(1) The understanding of factors affecting electricity consumption in high-energy-consuming industries needs to be deepened, and a unified and scientific analysis index system has not been formed. Factors affecting electricity consumption in high energy-consuming industries are complex, some are direct and some are indirect. Due to the in-depth understanding of the influencing factors, there are great differences in the analysis angles of various units on the electricity consumption of high energy-consuming industries. For example, some analyze industry factors, some analyze macroeconomic factors, some analyze output, and some analyze Analyze prices. Without a set of unified analysis indicators, it is difficult to analyze the power consumption of high energy-consuming industries in depth.

(2)没有掌握高耗能行业用电与影响因素的定量关系。比如:众所周知,钢铁行业用电量主要依赖于钢铁产量,而钢铁产量又依赖于房地产、汽车等下游行业的钢铁需求,但是钢铁产量与下游行业需求的定量关系如何,却没有可用的研究成果,因此目前对高耗能行业的分析或预测大多是定性的,很难开展定量分析或预测。(2) The quantitative relationship between electricity consumption and influencing factors in high energy-consuming industries has not been grasped. For example: as we all know, the electricity consumption of the steel industry mainly depends on the steel output, and the steel output depends on the steel demand of downstream industries such as real estate and automobiles. However, there are no available research results on the quantitative relationship between steel output and downstream industry demand. Therefore, most of the current analysis or prediction of high energy-consuming industries is qualitative, and it is difficult to carry out quantitative analysis or prediction.

(3)缺乏有针对性的预测方法。准确地预测高耗能行业用电对提高全社会用电预测的准确性有着重要意义。虽然目前国内外文献有关电力需求预测方法的成果很多,但这些预测方法基本上没有考虑高耗能行业用电的特点,针对性和有效性不强,这导致分析人员在预测高耗能行业用电时缺乏有效的方法和模型。有时勉强使用一些方法得到了预测结果,也是“知其然,不知其所以然”,不清楚高耗能行业用电变化的原因。(3) Lack of targeted forecasting methods. Accurately predicting electricity consumption in high-energy-consuming industries is of great significance to improving the accuracy of electricity consumption forecasting for the whole society. Although there are a lot of achievements in domestic and foreign literature on power demand forecasting methods, these forecasting methods basically do not consider the characteristics of electricity consumption in high-energy-consuming industries, and their pertinence and effectiveness are not strong. There is a lack of effective methods and models for electricity. Sometimes some methods are reluctantly used to get the prediction results, but it is also "knowing what is happening, not knowing why", and it is not clear why the electricity consumption of high-energy-consuming industries changes.

发明内容Contents of the invention

本发明的目的是提供一种钢铁行业电力需求的预测方法,该方法对钢铁行业用电的影响因素进行分类,采用定性和定量相结合的方法分析钢铁行业用电的关键影响因素,提出了分析钢铁行业用电的指标体系,并给出了需求预测模型,为电力企业准确预测相关行业电力需求进而预测全社会电力需求提供了一种科学、实用的技术方案。The purpose of the present invention is to provide a method for forecasting electricity demand in the iron and steel industry. The method classifies the influencing factors of electricity consumption in the iron and steel industry, uses a combination of qualitative and quantitative methods to analyze the key influencing factors of electricity consumption in the iron and steel industry, and proposes an analysis The index system of electricity consumption in the iron and steel industry and a demand forecasting model are given, which provides a scientific and practical technical solution for power companies to accurately predict the power demand of related industries and then predict the power demand of the whole society.

为了达到上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts following technical scheme:

对钢铁行业用电的影响因素进行了分类,采用定性和定量相结合的方法分析了钢铁行业用电的关键影响因素,评估了影响因素与用电量的相关程度,提出了分析钢铁行业用电的指标体系,有利于电力企业把握钢铁行业用电增长的内在规律。Classify the influencing factors of electricity consumption in the iron and steel industry, analyze the key influencing factors of electricity consumption in the iron and steel industry by combining qualitative and quantitative methods, evaluate the correlation between the influencing factors and electricity consumption, and put forward the analysis of electricity consumption in the iron and steel industry The index system is helpful for electric power companies to grasp the internal laws of the growth of electricity consumption in the iron and steel industry.

采用符合高耗能行业特点的预测方法,建立了钢铁行业的电力需求预测模型,检验了模型的有效性,评估了模型的预测精度。预测模型具有很强的针对性和实用性,预测效果好。Using the prediction method in line with the characteristics of high energy-consuming industries, a power demand prediction model for the iron and steel industry was established, the validity of the model was tested, and the prediction accuracy of the model was evaluated. The prediction model has strong pertinence and practicability, and the prediction effect is good.

本发明的优点是:充实和完善了现有电力需求预测方法,对提高电力企业的电力需求预测水平具有重要意义,可以产生显著的经济效益和社会效益:The present invention has the advantages of enriching and perfecting the existing power demand forecasting method, which is of great significance for improving the power demand forecasting level of electric power enterprises, and can produce significant economic and social benefits:

(1)应用于电力规划工作,提高规划中负荷预测的准确性,有利于优化电力项目投资规模,减少投资浪费。(1) Applied to power planning, improving the accuracy of load forecasting in planning, which is conducive to optimizing the investment scale of power projects and reducing investment waste.

(2)应用于电网企业的计划、调度和市场分析工作,可以提高市场分析的准确性,避免出现供不应求和严重供过于求的情况,减少电力供应不足和电力投资过剩带来的损失。(2) Applied to the planning, scheduling and market analysis of power grid enterprises, it can improve the accuracy of market analysis, avoid the situation of insufficient supply and serious oversupply, and reduce the losses caused by insufficient power supply and excess investment in power.

(3)应用于发电企业的发电计划和煤炭采购计划,可以帮助发电企业优化电煤库存,减少电煤资金占用,同时避免电煤短缺造成发电能力不足。(3) Applied to the power generation plan and coal procurement plan of power generation companies, it can help power generation companies optimize thermal coal inventory, reduce thermal coal capital occupation, and avoid insufficient power generation capacity caused by thermal coal shortages.

(4)电力企业可依据本发明开发电力需求预测软件产品。(4) Power companies can develop power demand forecasting software products according to the present invention.

具体实施方式detailed description

本发明的技术路线如下:Technical route of the present invention is as follows:

1.电力需求预测的基本步骤1. Basic steps of electricity demand forecasting

电力需求预测分为以下几个步骤:Electricity demand forecasting is divided into the following steps:

(1)预测目标与预测内容的确定;(1) Determination of forecast target and forecast content;

(2)相关历史资料的收集;(2) Collection of relevant historical data;

(3)基础资料的分析;(3) Analysis of basic data;

(4)电力系统相关因素数据的预测或获取;(4) Prediction or acquisition of data related to power system factors;

(5)预测模型和方法的选择与取舍;(5) Selection and trade-off of forecasting models and methods;

(6)建模;(6) modeling;

(7)数据预处理;(7) Data preprocessing;

(8)模型参数辨识;(8) Model parameter identification;

(9)模型评价,检验模型的显著性;(9) Model evaluation, to test the significance of the model;

(10)应用模型进行预测;(10) Apply the model for forecasting;

(11)预测结果的综合分析与评价。(11) Comprehensive analysis and evaluation of prediction results.

2.回归预测法2. Regression prediction method

电力需求是由经济发展程度所决定的,回归预测类模型便通过建立电力需求与经济变量之间的相关关系,以回归预测技术来实现对电力需求发展规律的捕捉。回归预测法通过对历史数据的分析研究,探索经济、社会各有关因素与电力需求的内在联系和发展变化规律,并根据对规划期内,本地区经济、社会发展情况的预测来推算未来的电力需求,其任务是确定预测值和影响因子之间的关系。回归预测法是最小二乘法原理的发展,目前可分为一元线性回归、多元线性回归和非线性回归模型。Power demand is determined by the degree of economic development. The regression forecasting model captures the development law of power demand by establishing the correlation between power demand and economic variables and using regression forecasting technology. The regression prediction method explores the internal relationship and development and change law of economic and social factors and power demand through the analysis and research of historical data, and calculates the future power according to the forecast of the economic and social development of the region during the planning period Requirements, whose task is to determine the relationship between predictors and impact factors. The regression prediction method is the development of the principle of the least square method, which can be divided into one-variable linear regression, multiple linear regression and nonlinear regression models at present.

1.一元线性回归模型1. Univariate linear regression model

一元线性回归模型表达式如下:The expression of the unary linear regression model is as follows:

y=f(S,X)=a+bx+εy=f(S,X)=a+bx+ε

式中S——模型的参数向量,S=[a,b]TIn the formula, S——the parameter vector of the model, S=[a,b] T ;

x——自变量;x——independent variable;

y——依赖于x的随机变量;y - a random variable that depends on x;

ε——服从正态分布N(0,σ2)的随机误差,又称为随机干扰。ε——Random error that obeys normal distribution N(0,σ 2 ), also known as random interference.

残差平方和为:The residual sum of squares is:

QQ (( aa ,, bb )) == ΣΣ nno (( ythe y ii -- aa -- bxbx ii )) 22 ,, (( ii == 1,21,2 ,, .. .. .. ,, nno ))

式中xi、yi——样本。In the formula, x i , y i —— samples.

利用最小二乘法来估计模型参数a、b,以使Q达到极小值,得到模型参数估计值为:Use the least squares method to estimate the model parameters a and b, so that Q can reach a minimum value, and the estimated value of the model parameters is:

bb ^^ == ΣΣ ii == 11 nno (( xx ii -- xx ‾‾ )) (( ythe y ii -- ythe y ‾‾ )) ΣΣ ii == 11 nno (( xx ii -- xx ‾‾ )) 22 aa ^^ == ythe y ‾‾ -- bb ^^ xx ‾‾

其中 x ‾ = 1 n Σ i = 1 n x i ; y ‾ = 1 n Σ i = 1 n y i , 预测方程为:in x ‾ = 1 no Σ i = 1 no x i ; the y ‾ = 1 no Σ i = 1 no the y i , The prediction equation is:

ythe y ^^ == aa ^^ ++ bb ^^ xx

2.多元线性回归模型2. Multiple linear regression model

设y为被解释变量,yi为y的第i次观测值y=(y1,y2...yn)T,k个解释变量x=(x1,x2...xk),β=(β01..βk)T为回归系数,ε=(ε12..εn)T表示随机误差。Let y be the explained variable, y i is the ith observed value of y y=(y 1 ,y 2 ...y n ) T , k explanatory variables x=(x 1 ,x 2 ...x k ), β=(β 01 ..β k ) T is the regression coefficient, ε=(ε 12 ..ε n ) T represents the random error.

记解释变量的观测值矩阵为 X = 1 x 11 · · · x 1 k 1 x 21 · · · x 2 k · · · · · · · · · 1 x nl · · · x 3 k , 则多元线性回归模型可写成矩阵的形式:Note that the observation matrix of explanatory variables is x = 1 x 11 &Center Dot; &Center Dot; &Center Dot; x 1 k 1 x twenty one &Center Dot; &Center Dot; &Center Dot; x 2 k &Center Dot; &Center Dot; &Center Dot; &Center Dot; &Center Dot; &Center Dot; &Center Dot; &Center Dot; &Center Dot; 1 x nl · · · x 3 k , Then the multiple linear regression model can be written in the form of a matrix:

y=Xβ+εy=Xβ+ε

(1)OLS估计(1) OLS estimation

①回归系数β的估计值 ① Estimated value of regression coefficient β

ββ ^^ == (( Xx TT Xx )) -- 11 Xx TT YY

满足性质: E ( β ^ ) = β , cov ( β ^ ) = σ 2 ( X T X ) - 1 Meet the nature: E. ( β ^ ) = β , cov ( β ^ ) = σ 2 ( x T x ) - 1

②随机误差ε② Random error ε

随机误差ε中各元素服从N(0,σ2)的正态分布。σ的估计值为Each element in the random error ε obeys the normal distribution of N(0,σ 2 ). The estimated value of σ is :

σσ ^^ 22 == ee TT ee nno -- kk -- 11 -- -- -- (( 11 -- 1414 ))

(2)线性相关显著性检验(2) Linear correlation significance test

①相关系数(可决系数)①Correlation coefficient (coefficient of determination)

根据方差分析, S T 2 = S R 2 + S E 2 , According to variance analysis, S T 2 = S R 2 + S E. 2 ,

相关系数:相关系数可用于描述y与x的线性相关程度,相关系数越接近1,则拟合效果越好。Correlation coefficient: The correlation coefficient can be used to describe the degree of linear correlation between y and x, and the closer the correlation coefficient is to 1, the better the fitting effect is.

②F检验②F test

F检验表明y与x线性相关是否显著。The F-test indicates whether the linear correlation between y and x is significant.

假设H01=β2=...βk=0Assume H 0 : β 12 =...β k =0

如果原假设成立,定义统计量: F = S R 2 k S E 2 n - k - 1 ~ F ( k , n - k - 1 ) 服从自由度为(k,n-k-1)的F分布,在一定的置信水平a下,若F>Fa拒绝原假设,否则接受原假设。If the null hypothesis holds, define the statistic: f = S R 2 k S E. 2 no - k - 1 ~ f ( k , no - k - 1 ) Obey the F distribution with degrees of freedom (k, nk-1), under a certain confidence level a, if F>F a reject the null hypothesis, otherwise accept the null hypothesis.

③t检验③t test

t检验是检验每个解释变量对被解释变量的影响程度是否显著,The t test is to test whether the influence degree of each explanatory variable on the explained variable is significant,

假设H0i=0,Suppose H 0i =0,

如果原假设成立,定义统计量: t = β ^ i σ ^ 2 ( X T X ) ii - 1 ~ t ( n - k - 1 ) , 在一定的置信水平a下,若t>ta拒绝原假设,否则接受原假设。If the null hypothesis holds, define the statistic: t = β ^ i σ ^ 2 ( x T x ) i - 1 ~ t ( no - k - 1 ) , Under a certain confidence level a, if t>t a rejects the null hypothesis, otherwise accepts the null hypothesis.

(3)预测(3) Forecast

给定解释变量的n+1期的值为:C=(1,xn+l,1,..,xn+l,k)T,所以yn+l得置信水平为1-a的预测区间为 ( C T β ^ ± t a 2 σ ^ 1 + 1 n + ( x 0 - x ‾ ) 2 Σ ( x i - x ‾ ) 2 ) . The value of the n+1 period of the given explanatory variable is: C=(1,x n+l,1 ,..,x n+l,k ) T , so the confidence level of y n+l is 1-a The prediction interval is ( C T β ^ ± t a 2 σ ^ 1 + 1 no + ( x 0 - x ‾ ) 2 Σ ( x i - x ‾ ) 2 ) .

(4)回归分析中的相关问题(4) Related issues in regression analysis

①多重共线性问题①Multicollinearity problem

在多元线性回归模型中,一个重要的假设是多元线性回归的各个解释变量之间不是线性相关的,但在实际建立多元线性回归模型时,不可避免的会引入两个或两个以上的解释变量,这些变量间或多或少存在相互关联。In the multiple linear regression model, an important assumption is that the explanatory variables of the multiple linear regression are not linearly correlated, but when the multiple linear regression model is actually established, two or more explanatory variables will inevitably be introduced , there are more or less correlations between these variables.

ⅰ多重共线性的后果i Consequences of multicollinearity

在完全共线性下,(XTX)-1不存在,求不出来,模型是失败的。Under perfect collinearity, (X T X) -1 does not exist, If it cannot be obtained, the model is a failure.

在近似共线性下,|(XTX)-1|≈0,所以(XTX)-1得对角线的值很大,由于所以回归系数的估计量的方差也会增大,结果会使得变量的显著性检验失效、模型的预测功能减弱。Under approximately collinearity, |(X T X) -1 |≈0, so (X T X) -1 has a large diagonal value, because Therefore, the variance of the estimator of the regression coefficient will also increase, which will invalidate the significance test of the variable and weaken the predictive function of the model.

ⅱ多重共线性的检验ⅱMulticollinearity test

回归模型检验法:将每个xi对其余变量进行回归,计算相应的可决系数,建立统计量:服从自由度为k-2和n-k-1的F分布,若Fi大于临界值,则xi与其余变量存在共线性。Regression model testing method: Regress each xi on the remaining variables and calculate the corresponding coefficient of determination , to build the statistic: Obey the F distribution with degrees of freedom k-2 and nk-1, if F i is greater than the critical value, then xi and other variables are collinear.

ⅲ多重共线性的补救ⅲ Multicollinearity Remedy

主要有三种办法:逐步回归法、主成分分析、减少估计量的方差(增大样本容量、合并变量)。There are three main methods: stepwise regression, principal component analysis, and reducing the variance of the estimator (increasing the sample size, merging variables).

逐步回归法:以y为被解释变量,逐步引入解释变量,构成回归模型,进行模型估计,如果拟合优度变化显著,则引入变量有影响,如果拟合优度变化不显著,则引入变量与先引入的变量存在共线性。Stepwise regression method: take y as the explained variable, gradually introduce explanatory variables to form a regression model, and perform model estimation. If the goodness of fit changes significantly, the introduced variable has an impact; if the goodness of fit does not change significantly, the variable is introduced There is collinearity with the variable introduced first.

②异方差问题② Heteroscedasticity problem

在多元线性回归模型中,误差项εi的方差与i无关,即是不变的。但由于模型的设定(实际非线性设定为线性)、忽略重要的解释变量、数据的测量误差等原因,是不同的,由于得不准使得OLS估计方差不在是最小,t检验也不再有意义。异方差在截面数据中出现的较多。In the multiple linear regression model, the variance of the error term ε i has nothing to do with i, that is is constant. However, due to the setting of the model (the actual nonlinearity is set to be linear), the neglect of important explanatory variables, and the measurement error of the data, etc., is different due to If it is not allowed, the OLS estimated variance is no longer the smallest, and the t test is no longer meaningful. Heteroskedasticity occurs more frequently in cross-sectional data.

异方差的检验常用方法主要有G-Q检验(方差单调型、大样本、除同方差条件不满足其余条件均满足)、White检验、Glejser检验、ARCH过程(时间序列、大样本)。The commonly used methods for testing heteroscedasticity mainly include G-Q test (variance monotonic type, large sample, all conditions are met except for homoscedasticity conditions), White test, Glejser test, and ARCH process (time series, large sample).

White检验是通过一个辅助回归式构造χ2统计量进行异方差检验。以二元线性回归模型为例:Yt=β01Xt12Xt2+utWhite test is to construct heteroscedasticity test through an auxiliary regression formula to construct χ 2 statistic. Take the binary linear regression model as an example: Y t01 X t12 X t2 +u t .

White检验的零假设H0:上式中的ut不存在异方差,备选假设H1:ut存在异方差。The null hypothesis H 0 of the White test: there is no heteroscedasticity in u t in the above formula, and the alternative hypothesis H 1 : there is heteroscedasticity in u t .

Step1首先对回归模型进行OLS回归,求残差et Step1 first performs OLS regression on the regression model, and finds the residual e t

Step2做如下回归模型Step2 do the following regression model

ee tt 22 == aa 00 ++ aa 11 Xx tt 11 ++ aa 22 Xx tt 22 ++ aa 33 Xx tt 11 22 ++ aa 44 Xx tt 22 22 ++ aa 55 Xx tt 11 Xx tt 22 ++ vv tt

Step3求可决系数,统计量自由度k是辅助回归式中解释变量的项数。Step3 Find the coefficient of determination and statistics The degree of freedom k is the number of terms of the explanatory variables in the auxiliary regression.

Step4White检验的判别规则为:若,接受H0,若 TR 2 > χ a ( k ) 2 , 拒绝H0The discriminant rule of Step4White test is: if , accept H 0 , if TR 2 > χ a ( k ) 2 , Reject H 0 .

③自相关问题③ Autocorrelation problem

在多元线性回归模型中,E(εεT/X)=σ2I,但由于经济变量的滞后性,或者模型的设定误差等原因,使得E(εiεj)≠0,i≠j,即发生序列的相关性,包括空间相关(截面数据)、时间序列相关。则ε的协方差矩阵D(ε)=σ2ΩΩ≠I。自相关和异方差一样会使得OLS估计不再是BLUE。In the multiple linear regression model, E(εε T /X) = σ 2 I, but due to the hysteresis of economic variables or the setting error of the model, E(ε i ε j )≠0, i≠j , that is, the correlation of the occurrence sequence, including spatial correlation (section data) and time series correlation. Then the covariance matrix D(ε) of ε=σ 2 ΩΩ≠I. Autocorrelation, like heteroscedasticity, will make the OLS estimate no longer BLUE.

ⅰ自相关的检验ⅰ autocorrelation test

自相关检验有D-W法、拉格朗日乘数检验(LM检验)。Autocorrelation test includes D-W method and Lagrangian multiplier test (LM test).

Durbin-Watson统计量衡量残差的一阶序列相关性,计算方法为:The Durbin-Watson statistic measures the first-order serial correlation of the residuals, calculated as:

DWDW == ΣΣ TT (( uu ^^ tt -- uu ^^ tt -- 11 )) 22 // ΣΣ TT uu ^^ tt 22

DW值的范围为(0~4),在2附近则序列不存在相关性,根据观测值的个数查表判断序列是否存在相关性。The range of DW value is (0~4). If it is around 2, there is no correlation in the sequence. According to the number of observed values, look up the table to judge whether there is correlation in the sequence.

DW统计量只适用于一阶自相关检验,而对于高阶自相关检验并不适用。利用LM统计量可建立一个适用性更强的自相关检验方法,既可检验一阶自相关,也可以检验高阶自相关。LM检验是通过一个辅助回归式完成的,具体步骤为: The DW statistic is only suitable for the first-order autocorrelation test, but not for the higher-order autocorrelation test. Using LM statistics, a more applicable autocorrelation test method can be established, which can test both first-order autocorrelation and higher-order autocorrelation. The LM test is done through an auxiliary regression formula, and the specific steps are:

Step1考虑误差项为n阶自回归形式Step1 considers that the error term is in the form of n-order autoregressive

ut=ρ1ut-1+...+ρnut-n+vtνt为随机项,符合各种假定条件。u t1 u t-1 +...+ρ n u tn +v t ν t is a random item, which meets various assumptions.

Step2零假设为H01=ρ2=...=ρn=0Step2 Null hypothesis is H 012 =...=ρ n =0

Step3建立残差辅助回归式:Step3 Establish residual auxiliary regression formula:

ee tt == pp ^^ 11 ee tt -- 11 ++ .. .. .. ++ pp ^^ nno ee tt -- nno ++ ββ 00 ++ ββ 11 Xx 11 tt ++ ββ 22 Xx 22 tt ++ .. .. .. ++ ββ kk Xx ktkt ++ vv tt

Step4计算可决系数R2,构造LM统计量:Step4 Calculate the coefficient of determination R 2 and construct the LM statistic:

LM=TR2 LM=TR 2

LM统计量渐进服从分布,判别规则为:若接受H0The LM statistic asymptotically obeys distribution, the discriminant rule is: if Accept H 0 ;

LM = TR 2 > χ ( n ) 2 , 拒绝H0like LM = TR 2 > χ ( no ) 2 , Reject H 0 .

ⅱ异方差和自相关的校正——广义最小二乘GLS。ⅱ Correction for heteroscedasticity and autocorrelation - generalized least squares GLS.

对于存在异方差和自相关的问题,可用广义线性模型表示:For problems with heteroscedasticity and autocorrelation, a generalized linear model can be used to represent:

y=Xβ+εD(ε)=σ2Ωy=Xβ+εD(ε)=σ 2 Ω

其中Ω为正定矩阵,所以Ω-1也是正定的,因而存在可逆矩阵G,使得Ω-1=GTG,所以对原始线性模型进行变换:Among them, Ω is a positive definite matrix, so Ω -1 is also positive definite, so there is an invertible matrix G, so that Ω -1 =G T G, so transform the original linear model:

Gy=GXβ+GεGy=GXβ+Gε

所以D(Gε)=σ2GΩGT,又因为GΩGT=I,得到D(Gε)=σ2I。So D(Gε)=σ 2 GΩG T , and because GΩG T =I, D(Gε)=σ 2 I is obtained.

这样在广义线性模型(1-19)中不存在自相关和异方差的问题了,得:In this way, there is no problem of autocorrelation and heteroscedasticity in the generalized linear model (1-19), and we get:

β ^ = ( X T Ω - 1 X ) - 1 X T Ω - 1 y , β的GLS估计是β的BLUE。 β ^ = ( x T Ω - 1 x ) - 1 x T Ω - 1 the y , GLS estimation of β It is BLUE of β.

而σ2得OLS估计 是σ2的无偏、一致估计。And σ 2 is estimated by OLS is an unbiased, consistent estimator of σ2 .

只有通过假设检验的多元线性回归模型才能应用于实践。Only multiple linear regression models that pass hypothesis testing can be applied in practice.

3.非线性回归模型3. Nonlinear regression model

非线性回归模型的自变量与因变量间存在的相关关系的表现形式是非线性的,在实际的系统中比较多见。但是非线性回归模型复杂,操作难度大,常见的非线性模型主要指那些可以通过适当的变量代换,将非线性关系转化为线性关系来处理。The form of the correlation between the independent variable and the dependent variable of the nonlinear regression model is nonlinear, which is more common in the actual system. However, nonlinear regression models are complex and difficult to operate. Common nonlinear models mainly refer to those that can be processed by converting nonlinear relationships into linear relationships through appropriate variable substitution.

(1)双曲线模型: (1) Hyperbolic model:

(2)幂函数曲线模型:y=axb(x>0,a>0)(2) Power function curve model: y=ax b (x>0, a>0)

(3)指数曲线模型:y=aebx(a>0)(3) Exponential curve model: y=ae bx (a>0)

(4)倒指数曲线模型: (4) Inverted exponential curve model:

(5)S形曲线模型: (5) S-curve model:

3.钢铁行业用电的相关性分析3. Correlation Analysis of Electricity Consumption in Iron and Steel Industry

3.1相关性分析方法3.1 Correlation analysis method

相关性分析是描述变量之间相关程度的强弱,并用适当的统计指标表示出来的过程,其目的是找出对钢铁行业用电有显著影响的因素,作为建立模型的基础。Correlation analysis is the process of describing the strength of the correlation between variables and expressing it with appropriate statistical indicators. Its purpose is to find out the factors that have a significant impact on the electricity consumption of the iron and steel industry as the basis for building a model.

3.1.1相关系数3.1.1 Correlation coefficient

相关系数也叫积差相关系数,是用于定量描述线性相关程度好坏的一个常用指标。Correlation coefficient, also called product-difference correlation coefficient, is a common indicator used to quantitatively describe the degree of linear correlation.

根据如上假设,设(X1,X2)服从二维正态分布其中为X1的均值、方差,为X2的均值、方差,ρ是X1、X2之间的相关系数。相关系数ρ的公式为:According to the above assumptions, let (X 1 ,X 2 ) obey the two-dimensional normal distribution in is the mean and variance of X 1 , is the mean and variance of X 2 , and ρ is the correlation coefficient between X 1 and X 2 . The formula for the correlation coefficient ρ is:

ρ=Cov(X1,X2)/(Var(X1)Var(X2))1/2(5.1)ρ=Cov(X 1 ,X 2 )/(Var(X 1 )Var(X 2 )) 1/2 (5.1)

利用矩估计法,得到ρ的矩估计值为:Using the moment estimation method, the moment estimation value of ρ is obtained as:

rr == ΣΣ ii == 11 nno (( Xx 11 ii -- Xx ‾‾ 11 )) (( Xx 22 ii -- Xx ‾‾ 22 )) [[ ΣΣ ii == 11 nno (( Xx 11 ii -- Xx ‾‾ 11 )) 22 ΣΣ ii == 11 nno (( Xx 22 ii -- Xx ‾‾ 22 )) 22 ]] 11 // 22 -- -- -- (( 5.25.2 ))

3.1.2逐步回归分析3.1.2 Stepwise regression analysis

在实际问题中,人们总是希望从对因变量y有影响的诸多变量中选择一些变量作为自变量,应用多元回归分析的方法建立“最优”回归方程以便对因变量进行预报或控制。所谓“最优”回归方程,主要是指希望在回归方程中包含所有对因变量y影响显著的自变量而不包含对y影响不显著的自变量。逐步回归分析正是根据这种原则提出来的一种回归分析方法。它的主要思路是在考虑的全部自变量中按其对y的显著程度大小或者说贡献大小,由大到小地逐个引入回归方程,而对那些对y作用不显著的变量可能始终不被引入回归方程。另外,己被引入回归方程的变量在引入新变量后也可能失去重要性,而需要从回归方程中剔除出去。引入一个变量或者从回归方程中剔除一个变量都称为逐步回归的一步,每一步都要进行F检验,以保证在引入新变量前回归方程中只含有对y影响显著的变量,而不显著的变量已被剔除。In practical problems, people always hope to select some variables as independent variables from many variables that have an impact on the dependent variable y, and use the method of multiple regression analysis to establish an "optimal" regression equation in order to predict or control the dependent variable. The so-called "optimal" regression equation mainly refers to the desire to include all independent variables that have a significant impact on the dependent variable y in the regression equation and not include independent variables that have no significant impact on y. Stepwise regression analysis is a regression analysis method proposed according to this principle. Its main idea is to introduce all independent variables into the regression equation one by one according to their significance or contribution to y from large to small, and those variables that have no significant effect on y may never be introduced. regression equation. In addition, the variables that have been introduced into the regression equation may also lose their importance after introducing new variables, and need to be removed from the regression equation. Introducing a variable or removing a variable from the regression equation is called a stepwise regression step, and F test is performed at each step to ensure that the regression equation only contains variables that have a significant impact on y before introducing new variables, and the insignificant ones Variables have been eliminated.

逐步回归分析的实施过程是每一步都要对已引入回归方程的变量计算其偏回归平方和(即贡献),然后选一个偏回归平方和最小的变量,在预先给定的F水平下进行显著性检验,如果显著则该变量不必从回归方程中剔除,这时方程中其它的几个变量也都不需要剔除。相反,如果不显著,则该变量要剔除,然后按偏回归平方和由小到大地依次对方程中其它变量进行F检验。将对y影响不显著的变量全部剔除,保留的都是显著的。接着再对未引入回归方程中的变量分别计算其偏回归平方和,并选其中偏回归平方和最大的一个变量,同样在给定F水平下作显著性检验,如果显著则将该变量引入回归方程,这一过程一直继续下去,直到在回归方程中的变量都不能剔除而又无新变量可以引入时为止,这时逐步回归过程结束。The implementation process of stepwise regression analysis is to calculate the partial regression sum of squares (that is, contribution) for the variables that have been introduced into the regression equation at each step, and then select a variable with the smallest partial regression square sum, and perform a significant analysis at a predetermined F level. If it is significant, the variable does not need to be removed from the regression equation, and at this time, the other variables in the equation do not need to be removed either. On the contrary, if it is not significant, the variable should be eliminated, and then the F test is performed on other variables in the equation according to the partial regression square sum from small to large. All variables that have no significant impact on y are eliminated, and all remaining variables are significant. Then calculate the partial regression sum of squares for the variables that have not been introduced into the regression equation, and select the variable with the largest partial regression square sum, and also perform a significance test at a given F level, and if it is significant, introduce the variable into the regression Equation, this process continues until the variables in the regression equation can not be eliminated and no new variables can be introduced, then the stepwise regression process ends.

3.1.3格兰杰(Granger)因果检验3.1.3 Granger causality test

格兰杰因果检验是由2003年度诺贝尔经济学奖的获得者克莱夫·格兰杰教授(CliveGranger)提出和建立的。相比于相关系数和回归分析,格兰杰因果检验从变化的先后时序方面说明变量的因果关系,以统计学的角度将因果性从相关性中分离出来。如果再结合正确的经济理论说明,则能更加准确地反映现实中活动主体的经济动力学关系。The Granger causality test was proposed and established by Professor Clive Granger, the winner of the 2003 Nobel Prize in Economics. Compared with correlation coefficient and regression analysis, Granger causality test illustrates the causal relationship of variables in terms of the sequence of changes, and separates causality from correlation from a statistical perspective. If it is combined with the correct economic theory, it can more accurately reflect the economic dynamics of the active subjects in reality.

格兰杰对因果性定义运用了信息集的概念,并且强调了事件发生的时序。令In为到n期为止宇宙中的所有信息,Yn为到n期为止所有的Yt(t=1,2,…n),Xn+1为第n+1期X的取值,In-Yn为除Y之外的所有信息。如果Y的加入改变了X的概率分布:F(Xn+1/In)≠F(Xn+1/(In-Yn)),即认为变量Y对X有格兰杰因果性。Granger uses the concept of an information set for the definition of causality and emphasizes the timing of events. Let I n be all the information in the universe up to period n, Y n be all Y t (t=1,2,…n) up to period n, and X n+1 be the value of X in period n+1 , I n -Y n is all the information except Y. If the addition of Y changes the probability distribution of X: F(X n+1 /I n )≠F(X n+1 /(I n -Y n )), the variable Y is considered to have Granger causality to X .

实际中,检验变量的分布函数非常困难,更简便的方法是从期望的角度处理,检验E(Xn+1/In)≠E(Xn+1/(In-Yn))。如果δn+1=E(Xn+1/In)-E(Xn+1/(In-Yn))显著不为0的话,则Y是X的格兰杰原因。后来发展到以预测精度来检验因果性关系,如果σ2(Xn+1/In)<σ2(Xn+1/(In-Yn)),则Y是X的格兰杰原因。信息集In不仅包括了所有的相关变量也包括了X和Y的无限滞后值,但在现实中,我们无法得到所有数据信息In,只能在可获得的信息集Jn的条件下,对变量的关系进行检验,这种格兰杰因果关系是基于可获得的信息集Jn得出的。In practice, it is very difficult to test the distribution function of variables, and a simpler method is to deal with it from the perspective of expectation, and test E(X n+1 /I n )≠E(X n+1 /(I n -Y n )). If δ n+1 =E(X n+1 /I n )−E(X n+1 /(I n −Y n )) is significantly different from 0, then Y is the Granger cause of X. Later, it was developed to test the causal relationship by prediction accuracy. If σ 2 (X n+1 /I n )<σ 2 (X n+1 /(I n -Y n )), then Y is the Granger of X reason. The information set I n not only includes all the relevant variables but also includes the infinite lag values of X and Y, but in reality, we cannot get all the data information I n , only under the condition of the available information set J n , The relationship of the variables is tested, this Granger causality is derived based on the available information set Jn .

3.1.4季节性调整3.1.4 Seasonal adjustment

X-12方法是美国商务部国势普查局在移动平均比法的基础上建立和发展起来的,它的特征在于除了能适应各种经济指标的性质,根据各种季节调整的目的,选择计算方式外,在不作选择的情况下,也能根据事先编入的统计基准,按数据的特征自动选择计算方式,在计算过程中,可根据数据中的随机因素大小,采用不同长度的移动平均,随机因素越大,移动平均长度越大。这种方法经过多次的调整和改善,已经成为一种相当精细、典型的季节调整方法,已为欧美、日本等国的官方和民间、国际机构(IMF)等采用,成为目前普遍使用的季节调整方法。The X-12 method was established and developed by the National Census Bureau of the U.S. Department of Commerce on the basis of the moving average ratio method. It is characterized in that it can adapt to the nature of various economic indicators and choose the calculation method according to the purpose of various seasonal adjustments. In addition, in the case of no selection, the calculation method can also be automatically selected according to the characteristics of the data according to the statistical benchmarks programmed in advance. During the calculation process, moving averages of different lengths can be used according to the size of the random factors in the data. The larger the factor, the larger the moving average length. After several adjustments and improvements, this method has become a fairly fine and typical seasonal adjustment method. It has been adopted by the official and non-governmental organizations (IMF) in Europe, America, Japan and other countries, and has become a commonly used seasonal adjustment method. Adjustment method.

3.1.5趋势调整3.1.5 Trend adjustment

趋势性是指时间序列Xt具有随时间的变化而变化的趋势。经过X-12的季节调整,我们能够得到时间序列中的趋势项和波动项TCt(TCt=Tt+Ct)。为了得到组成成分的长期变化趋势,需将趋势项和波动项进行分离,比较简单的方法如回归分析法、指数平滑法、差分法来处理时间序列中趋势性的部分,重点使用的是HP(Hodrick-Prescott)滤波法。Trend means that the time series X t has a tendency to change with time. After X-12 seasonal adjustment, we can get the trend item and fluctuation item TC t (TC t =T t +C t ) in the time series. In order to obtain the long-term change trend of the components, it is necessary to separate the trend item and the fluctuation item. Simpler methods such as regression analysis, exponential smoothing, and difference methods are used to deal with the trending part of the time series. The focus is on HP ( Hodrick-Prescott) filtering method.

3.1.6相关性分析思路3.1.6 Correlation Analysis Ideas

相关性分析有三个目的:第一,对定性分析中得到的影响因素进行量化,确立可以定量述影响因素的指标;第二,通过对指标的定量分析验证定性分析的正确性;第三,对三级影响因素进行筛选,得到用于分析建模的重要影响因素。Correlation analysis has three purposes: first, to quantify the influencing factors obtained in qualitative analysis, and to establish indicators that can quantitatively describe the influencing factors; second, to verify the correctness of qualitative analysis through quantitative analysis of indicators; third, to The three-level influencing factors are screened to obtain important influencing factors for analysis and modeling.

基于以上的目的,使用相关系数、格兰杰因果检验以及回归分析相结合的方法进行相关性分析。三种分析方法各有优缺点,采用三种定量分析方法来进行相关性分析,可以达到互相补充、互相验证的目的,提高影响因素和相关性分析的可靠性。Based on the above purpose, correlation analysis is carried out by using the method of correlation coefficient, Granger causality test and regression analysis. The three analysis methods have their own advantages and disadvantages. Using three quantitative analysis methods for correlation analysis can achieve the purpose of complementing each other and verifying each other, and improve the reliability of influencing factors and correlation analysis.

相关性分析的步骤是:The steps of correlation analysis are:

(1)对影响因素进行量化,确定影响因素指标集。根据定性研究,分析得出高耗能行业的三级影响因素,如从下游行业的角度细分了解国内需求,从进出口量和政策方面表示国际需求。用房地产投资来描述房地产行业需求,汽车产量来描述汽车行业需求等等。(1) Quantify the influencing factors and determine the index set of influencing factors. According to the qualitative research, the three-level influencing factors of high energy-consuming industries are analyzed, such as subdividing domestic demand from the perspective of downstream industries, and expressing international demand from the perspective of import and export volume and policies. Use real estate investment to describe the demand of the real estate industry, automobile production to describe the demand of the automobile industry, and so on.

(2)用影响因素与用电量的趋势对比图来初步分析影响因素与用电量的关联特征。(2) Use the trend comparison chart of influencing factors and electricity consumption to preliminarily analyze the correlation characteristics of influencing factors and electricity consumption.

(3)计算用电量与产品产量以及影响因素的相关系数,分析它们的相关性。(3) Calculate the correlation coefficient between electricity consumption, product output and influencing factors, and analyze their correlation.

(4)采用格兰杰因果检验的方法从统计学的角度分析用电量与影响因素是否具有因果关系。(4) Use Granger causality test to analyze whether there is a causal relationship between electricity consumption and influencing factors from a statistical point of view.

(5)采用逐步回归方法对前两种方法选出影响因素进行分析,剔除不重要的影响因素。(5) Use the stepwise regression method to analyze the influencing factors selected by the first two methods, and eliminate unimportant influencing factors.

(6)在综合三种分析方法的结果基础上,得到行业用电量的主要影响因素。(6) On the basis of synthesizing the results of the three analysis methods, the main influencing factors of electricity consumption in the industry are obtained.

3.2钢铁行业用电的相关性分析3.2 Correlation Analysis of Electricity Consumption in Iron and Steel Industry

3.2.1影响因素指标集3.2.1 Index Set of Influencing Factors

1、指标集的分析与建立1. Analysis and establishment of index set

钢铁行业用电的影响因素及相关的指标汇总为下表所示。The influencing factors and related indicators of electricity consumption in the iron and steel industry are summarized in the table below.

表5.1钢铁行业用电影响因素及指标汇总表Table 5.1 Summary of Factors and Indicators Affecting Electricity Consumption in Iron and Steel Industry

2、数据的收集及处理2. Data collection and processing

研究中涉及到的数据主要包括用电数据、行业数据、产品数据。其中用电数据包括2004年1月-2009年12月黑色金属冶炼及延压加工业用电量;产品数据包括粗钢、钢材、生铁的产量,下游行业数据主要包括了各个下游行业的产品产量、投资,以及关系到产品毛利的国内钢材综合价格指数、原材料价格指数等数据。The data involved in the research mainly include electricity consumption data, industry data, and product data. Among them, the electricity consumption data includes the electricity consumption of ferrous metal smelting and rolling processing industry from January 2004 to December 2009; the product data includes the output of crude steel, steel products, and pig iron; the downstream industry data mainly includes the product output of various downstream industries , investment, and domestic steel comprehensive price index, raw material price index and other data related to product gross profit.

数据来源主要有:国家统计局网站、中经网统计数据库、中国钢铁工业协会网、中国钢铁网、钢之家网、搜数网等。数据的处理主要包含投资数据的处理、价格数据的处理两个方面。对投资数据采用固定资产投资价格指数(当季)进行调整,将名义投资数据转换为实际投资数据。价格数据的处理主要是将相关的价格数据调整为以2004年1月为基准的定基数据序列。The data sources mainly include: the website of the National Bureau of Statistics, the statistical database of China Economic Network, the website of China Iron and Steel Industry Association, China Iron and Steel Network, Steel Home Network, Soushu.com, etc. Data processing mainly includes two aspects: investment data processing and price data processing. The investment data is adjusted by the fixed asset investment price index (current quarter), and the nominal investment data is converted into actual investment data. The processing of price data is mainly to adjust the relevant price data into a fixed-base data series based on January 2004.

3.2.2相关性分析结果3.2.2 Correlation analysis results

1、对一级影响因素的筛选结果1. Screening results of primary influencing factors

通过各种方法的综合分析,产品产量的三个指标:粗钢产量、钢材产量、生铁产量均与用电量高度相关,其中钢材产量与用电量的相关性更强,在建立预测模型时可以考虑用钢材产量作为自变量。Through the comprehensive analysis of various methods, the three indicators of product output: crude steel output, steel output, and pig iron output are highly correlated with electricity consumption, and the correlation between steel output and electricity consumption is stronger. When establishing a prediction model Steel production can be considered as an independent variable.

2、对二、三级影响因素的筛选结果2. Screening results of the second and third level influencing factors

通过相关系数分析、Granger因果检验、逐步回归分析对三级指标进行层层筛选,可得相关性结果汇总。经过各种方法综合筛选,以下指标对钢铁行业用电量影响最为显著:房地产开发投资、交通行业固定资产投资、汽车产量、国内钢材综合价格指数、原材料价格指数。Through correlation coefficient analysis, Granger causality test, and stepwise regression analysis, the three-level indicators are screened layer by layer, and the correlation results can be summarized. After comprehensive screening by various methods, the following indicators have the most significant impact on electricity consumption in the iron and steel industry: investment in real estate development, investment in fixed assets in the transportation industry, automobile production, comprehensive domestic steel price index, and raw material price index.

表三级影响因素相关性分析结果汇总Table 3 Summary of Correlation Analysis Results of Influencing Factors

4.钢铁行业用电的预测模型4. Prediction model of electricity consumption in steel industry

4.1钢铁行业用电量与钢材产量的回归模型4.1 Regression Model of Electricity Consumption and Steel Production in Iron and Steel Industry

4.1.1用电量绝对数与钢材产量绝对数的回归分析4.1.1 Regression Analysis of Absolute Number of Electricity Consumption and Absolute Number of Steel Production

设yt为用电量的观测值,xt为钢材产量的观测值。首先采用普通最小二乘(OLS)方法求解模型:Let yt be the observed value of electricity consumption , and xt be the observed value of steel production. First, the ordinary least squares (OLS) method is used to solve the model:

yt=c+βxt+ut(7.1)y t =c+βx t +u t (7.1)

从而得到模型的基本信息及对应的系数检验和模型拟合情况。In order to get the basic information of the model and the corresponding coefficient test and model fitting.

1、初始模型的建立和检验1. Establishment and testing of the initial model

对钢铁行业用电量与钢材产量建立一元线性回归模型,模型的解释变量为钢材产量,被解释变量为用电量,所采用的数据均为绝对数,数据的样本区间为2004年1月-2009年12月,共72组月度数据,通过Eviews软件计算模型的主要参数和统计量检验如表7.1所示。A linear regression model was established for electricity consumption and steel production in the iron and steel industry. The explanatory variable of the model is steel production, and the explained variable is electricity consumption. The data used are all absolute numbers, and the sample interval of the data is from January 2004 to In December 2009, a total of 72 sets of monthly data, the main parameters and statistics of the model calculated by Eviews software are shown in Table 7.1.

表7.1用电量与产量的回归结果及相关信息Table 7.1 Regression Results and Related Information of Electricity Consumption and Production

表7.1给出了解释变量—钢材产量和常数项的回归系数、系数标准误差,系数对应的t统计量及其伴随概率用于检验系数的有效性。如果伴随概率小于0.05说明回归系数是有效的,否则接受回归系数为零的原假设,表明该解释变量(或常数项)与因变量不相关。通过初步的回归分析,发现钢材产量对钢铁行业用电量有显著的影响,系数的显著性检验通过,且可决系数值为0.95,可决系数反映的是模型的拟合效果,该值越接近1,说明模型拟合得越好。Table 7.1 shows the explanatory variable—steel production and the regression coefficient of the constant item, the standard error of the coefficient, the t statistic corresponding to the coefficient and its accompanying probability to test the validity of the coefficient. If the accompanying probability is less than 0.05, it means that the regression coefficient is valid, otherwise accept the null hypothesis that the regression coefficient is zero, indicating that the explanatory variable (or constant item) is not related to the dependent variable. Through the preliminary regression analysis, it is found that the steel output has a significant impact on the electricity consumption of the steel industry. The significance test of the coefficient has passed, and the coefficient of determination value is 0.95. The coefficient of determination reflects the fitting effect of the model. The higher the value is The closer to 1, the better the model fit.

用电量的实际值与估计值的趋势非常相似,说明了模型估计的有效性;该模型的平均绝对误差为8.68亿千瓦时,平均相对误差为3.77%。经检验,模型存在自相关和异方差。因此,还需要对初始模型进行调整。The trend of the actual value of electricity consumption is very similar to the estimated value, which shows the validity of the model estimation; the average absolute error of the model is 868 million kWh, and the average relative error is 3.77%. After testing, the model has autocorrelation and heteroscedasticity. Therefore, adjustments to the initial model are also required.

2、对初始模型的调整2. Adjustments to the initial model

通过加入相关因素的滞后期的方法来调整模型的自相关性,对应的回归结果和相关信息如下表所示。The autocorrelation of the model is adjusted by adding the lag period of the relevant factors. The corresponding regression results and related information are shown in the table below.

表7.2加入滞后期变量的回归结果及相关信息Table 7.2 Regression results and related information with lagged variables added

上表所述的结果说明各个指标均通过了显著性检验,模型的拟合优度值为0.97,拟合效果非常好。The results described in the above table show that each indicator has passed the significance test, and the goodness of fit value of the model is 0.97, which shows that the fitting effect is very good.

在对以上模型做进一步的检验分析中发现,模型的自相关性消除了,但是仍然存在异方差性,而异方差性又会对模型OLS估计的有效性产生影响。因此在该模型影响指标的基础上,采用加权最小二乘法进行分析,得到的回归结果如下表所示。In the further inspection and analysis of the above model, it is found that the autocorrelation of the model is eliminated, but heteroscedasticity still exists, and the heteroscedasticity will affect the validity of the OLS estimation of the model. Therefore, on the basis of the impact indicators of the model, the weighted least squares method is used for analysis, and the regression results obtained are shown in the table below.

表7.3加权最小二乘法的回归结果及相关信息Table 7.3 Regression results and related information of weighted least squares method

上表所述的结果说明模型通过了整体显著性和系数显著性检验,同时拟合优度值为0.99,拟合效果非常好。模型的平均绝对误差为8.11亿千瓦时,平均相对误差为3.53%。The results described in the table above show that the model has passed the overall significance and coefficient significance tests, and the goodness-of-fit value is 0.99, which shows that the fitting effect is very good. The average absolute error of the model is 811 million kWh, and the average relative error is 3.53%.

3、对模型的统计检验3. Statistical testing of the model

(1)异方差检验(1) Heteroscedasticity test

在异方差检验中,F统计量所对应的伴随概率的值为0.92,大于0.05,说明模型不存在异方差。In the heteroscedasticity test, the value of the accompanying probability corresponding to the F statistic is 0.92, which is greater than 0.05, indicating that there is no heteroscedasticity in the model.

表7.4ARCH(1)检验结果Table 7.4 ARCH(1) test results

(2)自相关检验(2) Autocorrelation test

我们采用Ljun-BoxQ统计量来检验序列相关,其中任意一阶的Q统计量所对应的概率值都大于0.05,说明模型的扰动项不存在序列相关性。We use the Ljun-BoxQ statistic to test the serial correlation, and the probability value corresponding to any first-order Q statistic is greater than 0.05, indicating that there is no serial correlation in the disturbance term of the model.

(3)残差的正态性检验(3) Normality test of residuals

所述Jarque-Bera统计量所对应的伴随概率为0.72,大于0.05,说明模型的残差项服从正态分布,即t统计量检验是有效的。The accompanying probability corresponding to the Jarque-Bera statistic is 0.72, which is greater than 0.05, indicating that the residual item of the model obeys a normal distribution, that is, the t statistic test is valid.

综合以上的检验分析,说明所得到的模型满足了最小二乘法的相关前提条件,所得到的回归模型时有效的,从而可以将用电量与钢材产量的绝对数的关系表示成为以下形式:Based on the above inspection and analysis, it shows that the obtained model satisfies the relevant prerequisites of the least squares method, and the obtained regression model is valid, so that the relationship between electricity consumption and the absolute number of steel production can be expressed in the following form:

yt=0.03xt+0.42yt-1+17.57(7.2)y t =0.03x t +0.42y t-1 +17.57 (7.2)

其中,yt为钢铁行业用电量(亿千瓦时),xt为钢材产量(万吨),yt-1为前一期钢铁行业用电量。式(7.2)的含义是:钢铁行业的用电量受当期钢材产量和前一期用电量的影响,在已知前一期用电量的情况下,每增加一万吨的钢材产量,钢铁行业的用电量将增加0.03亿千瓦时。结合前面的分析,模型的平均绝对误差为8.11亿千瓦时,平均相对误差为3.53%。Among them, y t is the electricity consumption of the iron and steel industry (100 million kWh), x t is the steel output (10,000 tons), and y t-1 is the electricity consumption of the steel industry in the previous period. The meaning of formula (7.2) is: the electricity consumption of the iron and steel industry is affected by the steel production in the current period and the electricity consumption in the previous period. When the electricity consumption in the previous period is known, for every 10,000 tons of steel output, The electricity consumption of the iron and steel industry will increase by 0.03 billion kwh. Combined with the previous analysis, the average absolute error of the model is 811 million kWh, and the average relative error is 3.53%.

4.1.2用电量增速与钢材产量增速的回归分析4.1.2 Regression Analysis of Electricity Consumption Growth Rate and Steel Production Growth Rate

1、初始模型的建立和检验1. Establishment and testing of the initial model

这里使用用电量与钢材产量的同比当月增速建立初始回归模型,数据的样本区间为2005年1月-2009年12月,共60组月度数据,通过Eviews软件建立回归模型,回归结果如表7.5所示。Here, the initial regression model is established using the year-on-year growth rate of electricity consumption and steel production. The data sample range is from January 2005 to December 2009, with a total of 60 sets of monthly data. The regression model is established through Eviews software. The regression results are shown in the table shown in 7.5.

表7.5用电量与产量的回归结果及相关信息Table 7.5 Regression Results and Related Information of Electricity Consumption and Production

通过初步的回归分析,发现钢材产量增速对钢铁行业用电量增速有一定的影响,钢材产量增速的系数显著性检验通过。模型的拟合优度值为0.67,说明拟合效果一般。而且模型的平均相对误差为6.7%,误差较大。同时,结合相关的检验,模型还存在自相关性,因此需要对模型进行调整。Through preliminary regression analysis, it is found that the growth rate of steel production has a certain impact on the growth rate of electricity consumption in the steel industry, and the coefficient significance test of the growth rate of steel production has passed. The goodness of fit value of the model is 0.67, indicating that the fitting effect is general. Moreover, the average relative error of the model is 6.7%, which is relatively large. At the same time, combined with related tests, the model still has autocorrelation, so the model needs to be adjusted.

2、对初始模型的调整2. Adjustments to the initial model

通过加入相关因素的滞后期的方法进行模型的调整,再对模型进行统计检验。加入滞后期变量后的回归结果和相关信息如下表所示。The model is adjusted by adding the lag period of relevant factors, and then the model is statistically tested. The regression results and related information after adding lagged variables are shown in the table below.

表7.6加入滞后期变量的回归结果及相关信息Table 7.6 Regression results and related information with lagged variables added

通过对模型进行自相关的调整,拟合优度有了显著的提高,说明模型的拟合效果变好。同时,模型的整体显著性和各系数显著性检验均通过。得到的模型的拟合效果有了很大的提高,用电量的估计值与实际值之间的趋势图非常相似,模型的平均相对误差4.82%。By adjusting the autocorrelation of the model, the goodness of fit has been significantly improved, indicating that the fitting effect of the model has become better. At the same time, the overall significance of the model and the significance of each coefficient have passed the test. The fitting effect of the obtained model has been greatly improved, the trend graph between the estimated value and the actual value of electricity consumption is very similar, and the average relative error of the model is 4.82%.

3、对模型的统计检验3. Statistical testing of the model

(1)异方差检验(1) Heteroscedasticity test

在异方差检验中,F统计量所对应的伴随概率的值为0.15,大于0.05,说明模型不存在异方差。In the heteroscedasticity test, the value of the accompanying probability corresponding to the F statistic is 0.15, which is greater than 0.05, indicating that there is no heteroscedasticity in the model.

表7.7ARCH(1)检验结果Table 7.7 ARCH(1) test results

(2)自相关检验(2) Autocorrelation test

我们采用Ljun-BoxQ统计量来检验序列相关,样本区间为2005年2月至2009年12月,其中任意一阶的Q统计量所对应的概率值都大于0.05,说明模型的扰动项不存在序列相关性。We use the Ljun-BoxQ statistic to test the serial correlation. The sample interval is from February 2005 to December 2009. The probability value corresponding to any first-order Q statistic is greater than 0.05, indicating that there is no sequence in the disturbance item of the model. Correlation.

(3)残差的正态性检验(3) Normality test of residuals

所述Jarque-Bera统计量所对应的伴随概率为0.38,大于0.05,说明模型的残差项服从正态分布,即之前的t统计量检验是有效的。The accompanying probability corresponding to the Jarque-Bera statistic is 0.38, which is greater than 0.05, indicating that the residual item of the model obeys a normal distribution, that is, the previous t statistic test is valid.

综合以上的检验分析,说明我们所得到的模型满足了最小二乘法的相关前提条件,即说明回归模型时有效的,从而可以将用电量与钢材产量的增速的关系表示成为以下形式:Based on the above inspection and analysis, it shows that the model we obtained meets the relevant prerequisites of the least squares method, that is, the regression model is effective, so that the relationship between electricity consumption and the growth rate of steel production can be expressed in the following form:

yt=0.55xt+0.60yt-1-0.13(7.3)y t =0.55x t +0.60y t-1 -0.13 (7.3)

其中,yt为钢铁行业用电量当月增速,xt为钢材产量当月增速,yt-1为前一期用电量当月增速。Among them, y t is the monthly growth rate of electricity consumption in the iron and steel industry, x t is the monthly growth rate of steel production, and y t-1 is the monthly growth rate of electricity consumption in the previous period.

式(7.3)的含义是:钢铁行业的用电量增速受当期钢材产量增速、前一期用电量增速的影响,在已知前一期用电量的情况下,钢材产量每增加1%,则钢铁行业的用电量增加0.55%。结合前面的分析,模型的平均相对误差为4.82%。The meaning of formula (7.3) is: the growth rate of electricity consumption in the iron and steel industry is affected by the growth rate of steel production in the current period and the growth rate of electricity consumption in the previous period. An increase of 1%, the electricity consumption of the steel industry will increase by 0.55%. Combined with the previous analysis, the average relative error of the model is 4.82%.

4.1.3模型比较分析4.1.3 Model comparative analysis

将用电量与钢材产量的关系模型对比如下表7.8,两个模型都通过了各项检验,说明模型是合理、有效的,两个模型的拟合误差均比较小,形式也较为简单,相对而言,预测用电量绝对数的模型拟合误差更小,拟合效果更好。Comparing the relationship models between electricity consumption and steel production as shown in Table 7.8, the two models have passed various tests, indicating that the models are reasonable and effective. The fitting errors of the two models are relatively small, and the form is relatively simple. In terms of prediction, the model fitting error of predicting the absolute number of electricity consumption is smaller and the fitting effect is better.

表7.8用电量与产量关系模型汇总Table 7.8 Summary of relationship models between electricity consumption and output

4.2钢铁行业用电量与三级影响因素的回归模型4.2 Regression Model of Electricity Consumption and Three-Level Influencing Factors in Iron and Steel Industry

在钢铁行业用电量与相关下游行业影响因素之间的相关性分析中,我们得到了影响钢铁行业用电量较为显著的六个三级因素:房地产开发投资完成额、交通行业固定资产投资、汽车行业产量、钢材出口量、国内钢材综合价格指数、原材料价格指数。为了更有效的对钢铁行业的用电量进行预测,以下对用电量与影响指标间的关系进行回归分析。In the correlation analysis between the electricity consumption of the iron and steel industry and the influencing factors of related downstream industries, we obtained six three-level factors that significantly affect the electricity consumption of the iron and steel industry: the completed investment in real estate development, fixed asset investment in the transportation industry, Automobile industry output, steel export volume, domestic steel comprehensive price index, raw material price index. In order to more effectively predict the electricity consumption of the iron and steel industry, the following regression analysis is performed on the relationship between electricity consumption and impact indicators.

设yt为钢铁行业用电量,x1t为房地产开发投资完成额,x2t为交通行业固定资产投资,x3t为汽车行业产量,x4t为钢材出口量,x5t为国内钢材综合价格指数,x6t为原材料价格指数。首先,采用普通最小二乘(OLS)方法求解模型:Let yt be the electricity consumption of the iron and steel industry, x 1t be the completed investment in real estate development, x 2t be the investment in fixed assets in the transportation industry, x 3t be the output of the automobile industry, x 4t be the export volume of steel products, and x 5t be the comprehensive price index of domestic steel products , x 6t is the raw material price index. First, the model is solved using the ordinary least squares (OLS) method:

yt=c+β1x1t2x2t3x3t4x4t5x5t6x6t+ut(7.4)y t = c+β 1 x 1t2 x 2t3 x 3t4 x 4t5 x 5t6 x 6t +u t (7.4)

通过对以上回归模型的求解,可以得到用电量的实际值和估计值的拟合拟合误差,从而对模型的有效性进行初步的判断。然后,与前面的研究相同,在多元OLS估计中同样需要回归序列满足一些前提条件,因此还会对最终的模型做异方差、自相关和正态性检验,通过检验的模型才能更有效地对用电量进行预测。By solving the above regression model, the fitting error between the actual value and the estimated value of electricity consumption can be obtained, so as to make a preliminary judgment on the validity of the model. Then, the same as the previous research, in the multivariate OLS estimation, the regression sequence also needs to meet some preconditions, so the final model will be tested for heteroscedasticity, autocorrelation and normality, and the model that passes the test can be used more effectively. Forecast electricity usage.

4.2.1用电量绝对数与三级影响因素绝对数的回归模型4.2.1 Regression Model of Absolute Number of Electricity Consumption and Absolute Number of Tertiary Influencing Factors

1、初始模型的建立和检验1. Establishment and testing of the initial model

对钢铁行业用电量与三级影响因素建立多元线性回归模型,模型的解释变量为各种三级影响指标,被解释变量为用电量,所采用的数据均为绝对数,数据的样本区间为2004年1月-2009年12月,共72组月度数据,通过Eviews软件计算模型的主要参数和统计量检验如表7.9所示。Establish a multiple linear regression model for the electricity consumption of the steel industry and the third-level influencing factors. The explanatory variables of the model are various third-level influencing indicators, and the explained variable is electricity consumption. The data used are all absolute numbers, and the sample interval of the data From January 2004 to December 2009, there are 72 sets of monthly data in total. The main parameters and statistics of the model calculated by Eviews software are shown in Table 7.9.

表7.9用电量与影响因素的回归结果及相关信息Table 7.9 Regression Results and Related Information of Electricity Consumption and Influencing Factors

通过初步的回归分析,得到用电量与六个三级影响因素之间的相关关系,其中除了交通行业固定资产投资以外,其它五个影响因素的系数均通过了显著性检验,为了保证模型的有效性,在去掉交通行业固定资产投资指标后,得到了如下回归结果。Through preliminary regression analysis, the correlation between electricity consumption and six third-level influencing factors is obtained. Except for the fixed asset investment in the transportation industry, the coefficients of the other five influencing factors have passed the significance test. In order to ensure the accuracy of the model Effectiveness, after removing the fixed asset investment indicators in the transportation industry, the following regression results are obtained.

表7.10用电量与删减后影响因素的回归结果及相关信息Table 7.10 Regression results and related information of electricity consumption and deleted influencing factors

通过进一步的回归分析,得到剩下的五个影响因素对用电量有很强的相关关系,对应的系数均通过了显著性检验,且模型的可决系数值为0.95,说明拟合效果非常好,同时模型整体的显著性检验通过。模型拟合的平均绝对误差为9.52亿千瓦时,平均相对误差为4.42%,拟合效果一般。进一步结合模型的统计检验,模型存在自相关性和异方差性。因此,还需要利用相关的方法,对初始模型进行调整。2、对初始回归模型的调整Through further regression analysis, it is found that the remaining five influencing factors have a strong correlation with electricity consumption, and the corresponding coefficients have passed the significance test, and the coefficient of determination of the model is 0.95, indicating that the fitting effect is very good. OK, at the same time the overall significance test of the model passes. The average absolute error of the model fitting is 952 million kWh, and the average relative error is 4.42%. The fitting effect is average. Further combined with the statistical test of the model, the model has autocorrelation and heteroscedasticity. Therefore, it is also necessary to use related methods to adjust the initial model. 2. Adjustments to the initial regression model

基于前面的分析,通过在模型中加入相关因素的前一期的方法,消除模型的自相关性,并得到相应的回归结果如下表所示。Based on the previous analysis, the autocorrelation of the model is eliminated by adding relevant factors to the model in the previous period, and the corresponding regression results are shown in the table below.

表7.11加入滞后期变量的回归结果及相关信息Table 7.11 Regression results and related information with lagged variables added

通过调整,模型中的所有系数均通过了显著性检验,且模型的可决系数值为0.97,说明模型的拟合效果非常的好。调整后的模型平均绝对误差为8.07亿千瓦时,平均相对误差为3.82%,说明我们的调整达到了减小模型误差的目的。After adjustment, all the coefficients in the model have passed the significance test, and the coefficient of determination of the model is 0.97, which shows that the fitting effect of the model is very good. The average absolute error of the adjusted model is 807 million kWh, and the average relative error is 3.82%, which shows that our adjustment has achieved the purpose of reducing the model error.

3、对模型的统计检验3. Statistical testing of the model

(1)异方差检验(1) Heteroscedasticity test

表7.12ARCH(1)检验结果Table 7.12 ARCH(1) test results

在异方差检验中,F统计量所对应的伴随概率的值为0.32,大于0.05,说明模型不存在异方差。In the heteroscedasticity test, the value of the accompanying probability corresponding to the F statistic is 0.32, which is greater than 0.05, indicating that there is no heteroscedasticity in the model.

(2)自相关检验(2) Autocorrelation test

我们采用Ljun-BoxQ统计量来检验序列相关,样本区间为2004年2月至2009年12月,其中任意一阶的Q统计量所对应的概率值都大于0.05,说明模型的扰动项不存在序列相关性。We use the Ljun-BoxQ statistic to test the serial correlation. The sample interval is from February 2004 to December 2009. The probability value corresponding to any first-order Q statistic is greater than 0.05, indicating that there is no sequence in the disturbance item of the model. Correlation.

(3)残差的正态性检验(3) Normality test of residuals

Jarque-Bera统计量所对应的伴随概率为0.72,大于0.05,说明模型的残差项服从正态分布,即之前的t统计量检验是有效的。The accompanying probability corresponding to the Jarque-Bera statistic is 0.72, which is greater than 0.05, indicating that the residual item of the model obeys a normal distribution, that is, the previous t statistic test is valid.

综合以上的检验分析,说明所得到的模型满足了最小二乘法的相关前提条件,即说明得到的回归模型是有效的,从而可以将用电量与影响因素的绝对数的关系表示成为以下形式:Based on the above test and analysis, it shows that the obtained model meets the relevant preconditions of the least squares method, that is, the obtained regression model is effective, so that the relationship between electricity consumption and the absolute number of influencing factors can be expressed in the following form:

yt=1.18x1t+0.48x3t+5.10x4t+0.46x5t-0.90x6t+0.43yt-1+96.18(7.5)y t = 1.18x 1t +0.48x 3t +5.10x 4t +0.46x 5t -0.90x 6t +0.43y t-1 +96.18 (7.5)

其中,yt为钢铁行业用电量(亿千瓦时),x1t为房地产开发投资完成额(百亿元),x3t为汽车产量(万辆),x4t为钢材出口量(百万吨),x5t为国内钢材综合价格指数,x6t为原材料价格指数,yt-1为钢铁行业用电量的前一期。Among them, y t is the electricity consumption of the iron and steel industry (100 million kWh), x 1t is the completed investment in real estate development (10 billion yuan), x 3t is the output of automobiles (10,000 units), x 4t is the steel export volume (million tons ), x 5t is the comprehensive price index of domestic steel products, x 6t is the price index of raw materials, and y t-1 is the previous period of electricity consumption in the steel industry.

式(7.5)的含义是:钢铁行业的用电量受当期房地产开发投资完成额、汽车产量、钢材出口量、国内钢材综合价格指数、原材料价格指数和前一期用电量的影响。且钢铁行业用电量与房地产开发投资完成额、汽车产量、钢材出口量、国内钢材综合价格指数、前一期用电量之间呈正相关关系,与钢铁行业的原材料价格指数之间呈负相关关系。模型的平均绝对误差为8.07亿千瓦时,平均相对误差为3.82%。The meaning of formula (7.5) is: the electricity consumption of the iron and steel industry is affected by the current investment in real estate development, automobile production, steel export volume, domestic steel comprehensive price index, raw material price index and the electricity consumption of the previous period. Moreover, there is a positive correlation between electricity consumption in the iron and steel industry and the completed investment in real estate development, automobile production, steel export volume, domestic steel comprehensive price index, and electricity consumption in the previous period, and a negative correlation with the raw material price index in the iron and steel industry. relation. The average absolute error of the model is 807 million kWh, and the average relative error is 3.82%.

4.2.2用电量增速与三级影响因素增速的回归分析4.2.2 Regression Analysis of Growth Rate of Electricity Consumption and Growth Rate of Three Influencing Factors

1、初始模型的建立和检验1. Establishment and testing of the initial model

下面利用用电量与三级影响因素的同比增速建立回归模型,数据的样本区间为2005年1月-2009年12月,共60组月度数据,通过Eviews软件建立回归模型,回归结果如下表所示。Next, the regression model is established by using the year-on-year growth rate of electricity consumption and the third-level influencing factors. The sample period of the data is from January 2005 to December 2009, with a total of 60 sets of monthly data. The regression model is established through the Eviews software. The regression results are shown in the following table shown.

表7.13用电量增速与影响因素增速的回归结果及相关信息Table 7.13 Regression Results and Related Information of Growth Rate of Electricity Consumption and Growth Rate of Influencing Factors

上表的结果说明,六个影响因素的增速的系数通过了显著性检验,即说明影响指标的系数估计值均是有效的。可决系数的值为0.86说明模型的拟合效果较好。模型的平均相对误差为4.12%。进一步,结合相关的统计量检验,模型存在自相关性。因此,对模型的自相关性进行调整。The results in the above table show that the coefficients of the growth rate of the six influencing factors have passed the significance test, which means that the coefficient estimates of the influencing indicators are all valid. The value of the coefficient of determination is 0.86, indicating that the fitting effect of the model is good. The average relative error of the model is 4.12%. Furthermore, combined with relevant statistical tests, the model has autocorrelation. Therefore, the autocorrelation of the model is adjusted.

2、用广义最小二乘法做回归分析2. Use the generalized least squares method for regression analysis

通过运用广义最小二乘法,得到模型的回归结果如下表所示。By using the generalized least squares method, the regression results of the model are obtained as shown in the table below.

表7.14广义最小二乘法的回归结果及相关信息Table 7.14 Regression results and related information of generalized least squares method

上表说明原材料价格的系数检验不通过,虽然模型的拟合效果很好,且结合相关统计量检验,模型在广义最小二乘法回归下不存在自相关性,但是仍然需要去掉原材料价格指数的增速之后再进行回归分析,才能保证模型的估计结果的有效性。The above table shows that the coefficient test of raw material prices fails. Although the fitting effect of the model is very good, and combined with the relevant statistics test, the model does not have autocorrelation under the generalized least squares regression, but it is still necessary to remove the increase in the raw material price index. Regression analysis can only be done after the fast speed to ensure the validity of the estimated results of the model.

3、对初始模型的调整3. Adjustments to the initial model

基于广义最小二乘法的分析,首先在初始回归模型中去掉原材料价格增速这一影响因素,回归结果如下表所示。Based on the analysis of the generalized least squares method, first remove the influencing factor of raw material price growth in the initial regression model, and the regression results are shown in the table below.

表7.15用电量增速与删减后影响因素增速的回归结果及相关信息Table 7.15 Regression results and relevant information on the growth rate of electricity consumption and the growth rate of influencing factors after deletion

上表所述的结果说明,除了房地产开发投资增速以外,其他指标的系数均通过了显著性检验,因此需要进一步去掉房地产开发投资增速这一指标。其次,由于模型存在自相关性,但是广义最小二乘法并不能有效的减小模型的拟合误差,因此在我们通过广义最小二乘法淘汰了原材料价格指数之后,结合回归分析又淘汰了房地产开发投资增速,然后基于此在表7.15的模型中加入相关因素的前期项的方法,以消除模型的自相关性,并得到相应的回归结果如下表所示。The results stated in the above table show that, except for the growth rate of real estate development investment, the coefficients of other indicators have passed the significance test, so the indicator of real estate development investment growth rate needs to be further removed. Secondly, due to the existence of autocorrelation in the model, the generalized least squares method cannot effectively reduce the fitting error of the model. Therefore, after we eliminated the raw material price index through the generalized least squares method, combined with regression analysis, we eliminated the real estate development investment. The growth rate, and then based on this, the method of adding the previous items of relevant factors to the model in Table 7.15 to eliminate the autocorrelation of the model, and the corresponding regression results are shown in the following table.

表7.16加入滞后期变量的回归结果及相关信息Table 7.16 Regression results and related information with lagged variables added

上表中,各个指标均通过了显著性检验,进一步结合相关统计检验,模型还存在着异方差性。为了保证模型估计的有效性,对模型进行异方差处理后进行新的回归,得到下表的结果。In the above table, each indicator has passed the significance test, further combined with relevant statistical tests, the model still has heteroscedasticity. In order to ensure the validity of the model estimation, a new regression is performed after the heteroscedasticity treatment is performed on the model, and the results in the following table are obtained.

表7.17调整后模型的回归结果及相关信息Table 7.17 Regression results and related information of the adjusted model

上表所述的数据说明各个影响因素的系数均通过了显著性检验,且可决系数的值为0.92,说明模型的拟合效果较好。且该模型的平均相对误差为3.54%,与初始回归模型相比,误差有所减小。The data described in the above table shows that the coefficients of each influencing factor have passed the significance test, and the value of the coefficient of determination is 0.92, indicating that the fitting effect of the model is good. And the average relative error of the model is 3.54%, compared with the initial regression model, the error has been reduced.

4、对模型的统计检验4. Statistical testing of the model

(1)异方差检验(1) Heteroscedasticity test

由于F统计量所对应的伴随概率的值为0.59,大于0.05,说明模型不存在异方差。Since the value of the associated probability corresponding to the F statistic is 0.59, which is greater than 0.05, it shows that there is no heteroscedasticity in the model.

表7.18ARCH(1)检验结果Table 7.18 ARCH(1) test results

(2)自相关检验(2) Autocorrelation test

采用Ljun-BoxQ统计量来检验序列相关,样本区间为2005年2月至2009年12月,其中任意一阶的Q统计量所对应的概率值都大于0.05,说明模型的扰动项不存在序列相关性。The Ljun-BoxQ statistic is used to test the serial correlation. The sample interval is from February 2005 to December 2009. The probability value corresponding to any first-order Q statistic is greater than 0.05, indicating that there is no serial correlation in the disturbance term of the model. sex.

(3)残差的正态性检验(3) Normality test of residuals

Jarque-Bera统计量所对应的伴随概率为0.59,大于0.05,说明模型的残差项服从正态分布,即之前的t统计量检验是有效的。The accompanying probability corresponding to the Jarque-Bera statistic is 0.59, which is greater than 0.05, indicating that the residual item of the model obeys a normal distribution, that is, the previous t statistic test is valid.

综合以上的检验分析,说明所得到的模型满足了最小二乘法的相关前提条件,即说明得到的回归模型时有效的,从而可以将用电量与影响因素的增速的关系表示成为以下形式:Based on the above inspection and analysis, it shows that the obtained model meets the relevant preconditions of the least squares method, that is, the obtained regression model is valid, so that the relationship between power consumption and the growth rate of the influencing factors can be expressed in the following form:

yt=0.13x2t+0.19x3t+0.03x4t+0.16x5t+0.65yt-1-0.11x3t-1+0.12x3t-2(7.6)y t =0.13x 2t +0.19x 3t +0.03x 4t +0.16x 5t +0.65y t-1 -0.11x 3t-1 +0.12x 3t-2 (7.6)

其中,yt为钢铁行业用电量增速,x2t为交通行业固定资产投资增速,x3t为汽车行业产量增速,x4t为钢材出口量增速,x5t为国内钢材综合价格指数增速,yt-1为钢铁行业用电量增速的前一期,x3t-1为汽车行业产量增速的前一期,x3t-2为汽车行业产量增速的前两期,且各个指标均为基于原始数据的当月增速。Among them, y t is the growth rate of electricity consumption in the iron and steel industry, x 2t is the growth rate of fixed asset investment in the transportation industry, x 3t is the growth rate of the output of the automobile industry, x 4t is the growth rate of steel export volume, and x 5t is the comprehensive price index of domestic steel products Growth rate, y t-1 is the previous period of the growth rate of electricity consumption in the iron and steel industry, x 3t-1 is the previous period of the growth rate of the output of the automobile industry, x 3t-2 is the first two periods of the growth rate of the output of the automobile industry, And each indicator is the growth rate of the current month based on the original data.

式(7.6)的含义是:钢铁行业的用电量增速受当期交通行业固定资产投资增速、汽车产量增速、钢材出口量增速、国内钢材综合价格指数增速和前一期用电量增速、汽车产量增速、前两期汽车产量增速的影响。且钢铁行业用电量增速与交通行业固定资产投资增速、汽车产量增速、钢材出口量增速、国内钢材综合价格指数增速、前一期用电量增速和前两期汽车产量增速之间呈正相关关系,与前一期汽车产量增速之间呈负相关关系。模型的平均相对误差为3.54%,拟合效果较好。The meaning of formula (7.6) is: the growth rate of electricity consumption in the iron and steel industry is affected by the growth rate of fixed asset investment in the transportation industry, the growth rate of automobile production, the growth rate of steel export volume, the growth rate of domestic steel comprehensive price index and the power consumption in the previous period. The growth rate of automobile production, the growth rate of automobile production, and the influence of the growth rate of automobile production in the first two periods. And the growth rate of electricity consumption in the iron and steel industry and the growth rate of fixed asset investment in the transportation industry, the growth rate of automobile production, the growth rate of steel export volume, the growth rate of domestic steel comprehensive price index, the growth rate of electricity consumption in the previous period and the growth rate of automobile production in the first two periods There is a positive correlation between the growth rates and a negative correlation with the growth rate of automobile production in the previous period. The average relative error of the model is 3.54%, and the fitting effect is good.

4.2.3模型比较分析4.2.3 Model comparative analysis

将前面两个用电量与钢材产量的关系模型对比如下表7.19,两个模型都通过了各项检验,说明模型是合理、有效的,两个模型的拟合误差均比较小,形式也较为简单,相对而言,预测用电量增速的模型拟合效果更好。Comparing the previous two models of the relationship between electricity consumption and steel production as shown in Table 7.19, both models have passed various tests, indicating that the models are reasonable and effective. The fitting errors of the two models are relatively small, and the form is relatively Simple, relatively speaking, the model fitting effect of predicting the growth rate of electricity consumption is better.

表7.19三级影响因素预测模型汇总Table 7.19 Summary of Three-Level Influencing Factor Prediction Models

Claims (1)

1.一种钢铁行业电力需求的预测方法,其特征在于,该预测方法包含下列步骤: 1. A method for forecasting electricity demand in the iron and steel industry, characterized in that the method for forecasting comprises the following steps: (1)采集和整理钢铁行业用电量和钢材产量的历史数据; (1) Collect and organize the historical data of electricity consumption and steel production in the iron and steel industry; (2)对钢铁行业用电量与钢材产量建立一元线性回归模型,模型的解释变量为钢材产量,被解释变量为用电量,数据采用绝对数,得到解释变量—钢材产量和常数项的回归系数、系数标准误差,系数对应的统计量及其伴随概率用于检验系数的有效性;如果伴随概率小于0.05则采用回归系数,否则接受回归系数为零的假设; (2) Establish a linear regression model for electricity consumption and steel production in the iron and steel industry. The explanatory variable of the model is steel production, and the explained variable is electricity consumption. The data uses absolute numbers to obtain the regression of explanatory variables—steel production and constant items The coefficient, the standard error of the coefficient, the statistics corresponding to the coefficient and its accompanying probability are used to test the validity of the coefficient; if the accompanying probability is less than 0.05, the regression coefficient is used, otherwise the assumption that the regression coefficient is zero is accepted; (3)通过加入相关因素的滞后期的方法来调整模型的自相关性,再对各个指标、整体及系数进行显著性检验; (3) Adjust the autocorrelation of the model by adding the lag period of the relevant factors, and then test the significance of each index, the whole and the coefficient; (4)对模型进行异方差检验、自相关检验和残差的正态性检验,使模型满足最小二乘法的相关前提条件; (4) Carry out heteroscedasticity test, autocorrelation test and residual normality test on the model, so that the model meets the relevant preconditions of the least squares method; (5)通过上述步骤得到用电量与钢材产量的绝对数的关系为: (5) Through the above steps, the relationship between electricity consumption and the absolute number of steel production is obtained as follows: 其中,为钢铁行业用电量,单位:亿千瓦时;为钢材产量,单位:万吨,为前一期钢铁行业的用电量; in, It is the electricity consumption of the iron and steel industry, unit: 100 million kWh; is steel output, unit: 10,000 tons, is the electricity consumption of the steel industry in the previous period; (6)对用电量与钢材产量的同比当月增速建立回归模型,再对钢材产量增速的系数进行显著性检验和自相关性检验; (6) Establish a regression model for the year-on-year growth rate of electricity consumption and steel production, and then conduct significance test and autocorrelation test for the coefficient of steel production growth rate; (7)通过加入相关因素的滞后期的方法进行模型的调整,再对模型进行异方差检验、自相关检验和残差的正态性检验,使得模型满足最小二乘法的相关前提条件; (7) Adjust the model by adding the lag period of the relevant factors, and then carry out the heteroscedasticity test, autocorrelation test and normality test of the residual error on the model, so that the model meets the relevant prerequisites of the least squares method; (8)通过上述步骤得到用电量与钢材产量的增速的关系为: (8) Through the above steps, the relationship between electricity consumption and the growth rate of steel production is obtained as: 其中,为钢铁行业用电量当月增速,为钢材产量当月增速,为前一期用电量当月增速。 in, The monthly growth rate of electricity consumption in the steel industry, is the monthly growth rate of steel production, It is the monthly growth rate of electricity consumption in the previous period.
CN201210504051.9A 2012-11-30 2012-11-30 A kind of Forecasting Methodology of steel industry electricity needs Active CN103258069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210504051.9A CN103258069B (en) 2012-11-30 2012-11-30 A kind of Forecasting Methodology of steel industry electricity needs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210504051.9A CN103258069B (en) 2012-11-30 2012-11-30 A kind of Forecasting Methodology of steel industry electricity needs

Publications (2)

Publication Number Publication Date
CN103258069A CN103258069A (en) 2013-08-21
CN103258069B true CN103258069B (en) 2016-01-20

Family

ID=48961986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210504051.9A Active CN103258069B (en) 2012-11-30 2012-11-30 A kind of Forecasting Methodology of steel industry electricity needs

Country Status (1)

Country Link
CN (1) CN103258069B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473605A (en) * 2013-08-22 2013-12-25 中国能源建设集团广东省电力设计研究院 Method and system for predicting energy consumption
CN103413253B (en) * 2013-09-04 2016-05-18 国家电网公司 A kind of classification of the annual peak load based on economy, meteorologic factor Forecasting Methodology
CN103514491B (en) * 2013-10-18 2016-09-07 国网四川省电力公司自贡供电公司 A kind of Methods of electric load forecasting
CN103617458A (en) * 2013-12-06 2014-03-05 李敬泉 Short-term commodity demand prediction method
CN104123600B (en) * 2014-08-14 2017-03-08 国家电网公司 A kind of electric power manager's exponential trend Forecasting Methodology towards representative row sparetime university data
CN104217105B (en) * 2014-08-21 2017-02-15 国家电网公司 Energy demand condition density prediction method
CN104484708B (en) * 2014-11-12 2018-07-13 南京大学 A kind of commodity demand prediction method based on one-variable linear regression and least square method
CN104537436B (en) * 2014-12-18 2017-11-10 大连理工大学 A kind of regional small power station's generating capacity Forecasting Methodology
CN104578056A (en) * 2015-01-15 2015-04-29 国家电网公司 Method for predicting maximum load of district based on tea making yield of district
CN106126770A (en) * 2016-05-31 2016-11-16 中国地质科学院矿产资源研究所 S-shaped model based steel demand prediction method
JP6662310B2 (en) * 2017-01-11 2020-03-11 横河電機株式会社 Data processing device, data processing method and program
CN106600079A (en) * 2017-01-19 2017-04-26 贵州黔源电力股份有限公司 Method for pre-estimation of trend of plan curve of power plant
CN109214637B (en) * 2017-07-07 2020-12-08 中国移动通信集团陕西有限公司 A method, device, storage medium and computing device for determining power consumption of network element
CN108805343A (en) * 2018-05-29 2018-11-13 祝恩元 A kind of Scientech Service Development horizontal forecast method based on multiple linear regression
CN110659951A (en) * 2018-06-29 2020-01-07 天津宝钢钢材配送有限公司 Multi-steel mill intelligent purchasing method based on big data
CN109409573B (en) * 2018-09-26 2022-04-12 北京交通大学 Prediction method of road rescue demand based on linear regression model
CN110428095A (en) * 2019-07-22 2019-11-08 中国海洋石油集团有限公司 A kind of offshore oilfield long-medium term power load forecasting method
CN111077107A (en) * 2020-01-08 2020-04-28 山东金璋隆祥智能科技有限责任公司 Online detection method for content of glycoside in stevioside extracting solution
CN111553524A (en) * 2020-04-23 2020-08-18 国网能源研究院有限公司 Method for forecasting electricity consumption in ferrous metal smelting and rolling industry
CN113052385A (en) * 2021-03-29 2021-06-29 国网河北省电力有限公司经济技术研究院 Method, device, equipment and storage medium for predicting power consumption in steel industry

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1585953A (en) * 2001-09-13 2005-02-23 Abb股份有限公司 Method and system to calculate a demand for electric power

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7647137B2 (en) * 2007-03-13 2010-01-12 Honeywell International Inc. Utility demand forecasting using utility demand matrix

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1585953A (en) * 2001-09-13 2005-02-23 Abb股份有限公司 Method and system to calculate a demand for electric power

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
低碳经济目标下我国电力需求预测研究;范德成 等;《电网技术》;20120731;第36卷(第7期);第19-25页 *
基于混沌时间序列的电力需求短期预测分析;傅毓维 等;《统计与决策》;20101231(第8期);第38-40页 *

Also Published As

Publication number Publication date
CN103258069A (en) 2013-08-21

Similar Documents

Publication Publication Date Title
CN103258069B (en) A kind of Forecasting Methodology of steel industry electricity needs
Meng et al. Decoupling, decomposition and forecasting analysis of China's fossil energy consumption from industrial output
CN110110981B (en) A credit rating default probability measurement and risk early warning method
Lin Electricity demand in the People's Republic of China: investment requirement and environmental impact
CN103295075B (en) A kind of ultra-short term load forecast and method for early warning
CN109858728B (en) Load prediction method based on industry-divided electricity utilization characteristic analysis
CN105701559A (en) Short-term load prediction method based on time sequence
CN104680313A (en) Method for monitoring and screening urban high-energy-consumptive enterprises
Adeyemi et al. The determinants of capacity utilization in the Nigerian manufacturing sector
Didenko Modeling the global nickel market with a triangular simultaneous equations model
Wu et al. Comparative study on power efficiency of China's provincial steel industry and its influencing factors
CN105956716A (en) Total social electricity consumption prediction method based on industry economy and electricity relationship
CN104978610A (en) Power grid demand side dispatchable capacity prediction method and power dispatching method
CN104881718A (en) Construction Method of Regional Power Prosperity Index Based on Multi-scale Economic Leading Indicators
CN105913366A (en) Industrial electric power big data-based regional industry business climate index building method
Cui et al. Spatiotemporal differentiation of energy eco-efficiency of shipbuilding industry in China
Rose et al. A meta-analysis of the economic impacts of climate change policy in the United States
Li et al. Big data oriented macro-quality index based on customer satisfaction index and PLS-SEM for manufacturing industry
CN107679659A (en) Infant industry electricity demand forecasting method
Zhao et al. MFCA-based simulation analysis for production LOT-size determination in a multi-variety and small-batch production system
Pyataev Rail transport in the system of Russian national input-output tables
Stundziene Prediction of Lithuanian GDP: Are regression models or time series models better?
Bantelay et al. Analytical modeling of specific energy consumption and cost share in comprehensive textile industry: case study of Ethiopia
Ahmadi Developing a green economy and investing in renewable energy along with green credit
Stundžienė Prediction of GDP based on the lag economic indicators

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160920

Address after: 430077 No. 47 Xu Dong Road, fruit lake street, Wuchang District, Hubei, Wuhan

Patentee after: Econimic Research Institute of Grid State Hubei Power Supply Company

Patentee after: State Grid Corporation of China

Address before: 430077, 359 East Main Street, Wuchang District, Hubei, Wuhan

Patentee before: Wuhan Central China Power Grid Co., Ltd.

Patentee before: State Grid Corporation of China