CN110956196B

CN110956196B - Automatic recognition method for window wall ratio of urban building

Info

Publication number: CN110956196B
Application number: CN201910964461.3A
Authority: CN
Inventors: 王超; 石邢; 王萌; 柳儒杨
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2024-03-08
Anticipated expiration: 2039-10-11
Also published as: CN110956196A

Abstract

The invention discloses an automatic identification method for window wall ratio of a city building. And (3) generating an XML file to be imported into an improved Unet framework by shooting a sample picture meeting certain requirements and carrying out pixel-level labeling. And respectively carrying out recognition training on the outer wall and the window in the picture by adopting a zooming and window sweeping mode. After training, a certain number of samples are selected for prediction and picture repair, and when more than 80% of prediction sample recognition errors are less than 10%, the model training effect is considered to be good. At this time, a model prediction library is established according to the set parameters, and the model prediction library is popularized to the window wall ratio identification application of mass buildings in cities. For urban wide-area picture acquisition, shooting or free map websites can be relied on. And finally, inputting the pictures to be predicted into a model prediction library, and predicting to obtain the window wall ratio of each building. The automatic identification method for the window wall ratio of the urban building can be used for simulating urban energy consumption, and can effectively improve the establishment speed and accuracy of an urban energy consumption model.

Description

Automatic recognition method for window wall ratio of urban building

Technical Field

The invention belongs to the technical field of energy conservation, relates to model construction of urban energy consumption simulation, and in particular relates to an automatic identification method for window wall ratio of an urban building.

Background

Under the new era, the aim of energy conservation and emission reduction is achieved, the current energy consumption needs to be analyzed, and the future energy consumption demand of the city needs to be predicted clearly and accurately, so that a reasonable and effective energy policy is formulated in advance. This urgent need has prompted the "urban energy consumption simulation" approach.

Urban energy consumption simulation is a 'bottom-up' simulation method based on a physical model. The city energy consumption simulation is carried out by establishing a three-dimensional model of the city and endowing the model with a plurality of information including the thermal parameters of the building envelope, the parameters of the equipment system, the staff activity list, the equipment operation list, the weather information and the like. The german scholars Nouvel (2017) points out that the accuracy of the urban energy consumption simulation has a very close relationship with the accuracy of the input parameters. Of these input parameters, the establishment of the three-dimensional model of the city is not only the basis of the simulation, but also the key of the simulation. The accurate three-dimensional model of the city is built, so that the accuracy of city energy consumption simulation can be greatly improved. The american scholars Cerezo Davila (2016) indicated that in the building of three-dimensional models of cities, the wall ratio information often deviates greatly from the actual value. The window wall ratio refers to the ratio of the total area of an external window (including a transparent curtain wall) in a certain direction to the total area of a wall surface in the same direction (including the window area). This parameter generally determines the lighting and insulating effect of the building and has an important impact on the energy consumption of the building. In practice, the index is not easily obtained directly, and in order to simplify modeling, researchers generally use a random value between 0.1 and 0.8 as a window wall ratio of a building. This simplification undoubtedly causes inaccuracy in the input parameters, which in turn affects the accuracy of the whole city energy consumption simulation.

In recent years, researches on the acquisition of building window wall ratio information are rarely performed. One more advanced method is to acquire building window wall ratio information by oblique photography of an unmanned aerial vehicle. However, this technique is expensive to implement, and unmanned aerial vehicles fly in cities, need to be controlled by many regulations, and are difficult to photograph in a large scale. Therefore, the method is difficult to acquire window wall ratio information of mass buildings in urban scale. With the continuous development of computer deep learning, image recognition technology is becoming more mature. Image recognition technology is currently widely used for autopilot, indoor navigation, street view analysis, etc. Spanish scholars Garcia-Garcia (2017) indicate in papers that computers learn from a large number of training samples through algorithms such as convolutional neural networks, and achieve semantic segmentation between different objects to accomplish image recognition. Although the collection and sorting of training samples takes a certain amount of time and effort, it is necessary to build an efficient training database given that this is a one-time and one-time effort.

Disclosure of Invention

The invention aims to solve the problems that: the existing method for acquiring the window wall ratio of the building has low accuracy or too high implementation cost, cannot be popularized to modeling of mass buildings in cities, needs a lower, rapid, accurate and universal method for acquiring the window wall ratio information of the building in the city, and provides technical support for urban energy consumption simulation research.

In order to solve the technical problems, the invention discloses an automatic identification method for the window wall ratio of urban buildings, which is used for acquiring window wall ratio information of mass urban buildings, and a training library comprising a certain number of building facades is also required to be established so as to guide a computer to automatically identify the window wall ratio of other residual buildings. In fact, the number of training samples is far less than the number of buildings that need to be identified. By the method, the working efficiency can be greatly improved, and compared with a conventional 'random value method', the method can also improve the accuracy of urban three-dimensional modeling, and further improve the accuracy of urban energy consumption simulation.

In order to achieve the aim, the invention discloses an automatic identification method for the window wall ratio of an urban building, which comprises the following steps:

step one: shooting a sample building elevation to obtain a sample picture;

sample building facade shooting is an initial step of building a training library. For the selection of samples, there are the following requirements:

1. the sample building should cover as much as possible various building types, such as: residential, mall, office, school, hospital, etc.;

2. the sample building should cover as much as possible different building styles, such as: large area glass curtain wall office building and non-large area glass curtain wall office building; standard rectangular elevation building, non-rectangular elevation building, etc.;

3. the sample building should choose different years of building as representative as possible, such as: new, old residential cells, etc.

When shooting a building facade, the whole facade is not required to be shot completely, but the whole facade is ensured as much as possible:

1. shooting the elevation of the target at a front view angle, and keeping the target from looking up and tilting;

2. facade shots are essentially unobstructed, such as: tree shielding, automobile shielding, etc.;

3. the building facade is shot in a short distance to prevent the window from being displayed too small.

The shooting tool can select mobile devices such as a mobile phone and a camera.

Step two: labeling a sample picture;

the invention adopts pixel level labeling; the pixel-level labeling refers to labeling along the outer contour of an object, in the invention, specifically, labeling the contour of an outer wall of a building and the contour of each window in a shot picture one by one, and respectively giving labels to the labeled objects: "wall" or "window". After the labeling is completed, the XML file (Extensible Markup Language ) is exported and generated.

Step three: training a model;

1. training model frame selection: the basic framework model adopted by the invention is a Unet framework. The uiet was originally designed for medical image segmentation, which has the advantages that: the required input characteristic parameters are few, and the method is suitable for the prediction of small sample size and large-image photos. The invention aims to automatically identify the windows of the building and the outer walls, and distinguish the outer walls, the outer windows and other objects by adopting characteristic parameters of shapes and colors (reflected by RGB values), wherein the characteristic parameters are relatively less. The invention expects to predict the window wall ratio of a large number of buildings through a small number of building marks, and is a small sample problem. In addition, the resolution of the photographed picture is generally greater than 3000 x 3000, which is a large image picture. In summary, the Unet architecture is well suited to the needs of the present invention.

Improvement of the Unet architecture: the Unet architecture is also called an encoder-decoder structure and consists of three parts, wherein the Unet architecture comprises feature extraction, upsampling and channel; the first half acts as feature extraction (also called "downsampling", encoder section), the second half as upsampling (decoder section), and the middle as channels for gradient flow. The feature extraction has the following effects: the input picture is subjected to downsampling coding to obtain high-resolution information of concentrated original picture information; the purpose of up-sampling is to recover the picture information to obtain the required result; the purpose of the channel is to facilitate the model to better collect global information. The improvement of the Unet architecture is as follows: (1) Increasing the sampling layer number and adding bn layers, and improving the accuracy of feature extraction; (2) The decoder calls the pre-trained ResNet model parameters in the ILSVRC match, so that the decoding precision can be improved, and the model training time can be greatly reduced.

3. Outer wall identification training: in order to save the time cost of training, the Unet training network requires that the input picture resolution is uniform and not excessively large, and is preferably between 250×250 and 600×600. However, in general, the resolution of a photographed picture is generally larger than the required value, and a certain process is required. The common treatment modes are as follows: scaling and windowing (windowing refers to dividing an image and then scanning one by one). Scaling can lose detail of the picture, while sweeping the window can greatly increase training time. For the outer wall, the picture is large, the picture can be compressed into the Unet network in a scaling mode, the detail loss is small, and the detail loss can be ignored. Then, binary recognition training is carried out, namely, the outer wall is taken as one type, and other objects are taken as the other type to carry out recognition learning.

4. Window identification training: the window is characterized in that: the number is large, but the area of each window is small. If the window is scaled, excessive details are lost, and the training effect is affected. Therefore, firstly, the picture needs to be subjected to window scanning treatment, and the picture is divided into small blocks and respectively put into a Unet network. Then, the binary recognition training of the window is performed for each small block.

Step four: predicting a sample picture;

1. predicting sample picture shooting requirements: in practical application, shooting of the outer facade of the target building generally cannot completely meet shooting requirements of training samples, and some problems of inclination, looking up for shooting and tree shielding are likely to exist. In fact, in order to be as close as possible to the actual shooting situation, the picture shooting requirements of the predicted samples are far less stringent than when training, which allows for certain tilt and tree shading problems for the building in the picture.

2. And (3) predicting an outer wall: the prediction of the exterior wall and the window is performed separately. When the outer wall is predicted, the picture to be predicted is compressed to the size specified by the training picture in a scaling mode, the picture is put into a model for prediction, and the picture is restored to the original size after a result is obtained.

3. Window prediction: when the window is predicted, the picture to be predicted is segmented in a window sweeping mode, and the segmentation size is consistent with the size specified by the training picture. And then, putting all the segmented pictures into a model for prediction, and splicing the pictures to the original size after obtaining a result.

4. Automatic repair of missing elements: because the trees have certain shielding to partial buildings, the outer walls and partial windows can be damaged, and the elements of the outer walls and partial windows need to be repaired. The invention adopts the repair based on CRF (Conditional Random Field) algorithm, which corrects the predicted image by combining the shape and color information of the original image.

Step five: model accuracy verification;

after the automatic identification of the wall and the window and the automatic repair of the missing part thereof are completed, the areas of the outer wall and the window on each predicted picture are automatically counted by a computer, and the window-wall ratio eta is calculated by the formula (1) _i ：

η _i ＝S _i，window /S _i，wall (1)

Wherein i represents the i-th predicted picture, S _i,window Representing the total area of windows in the ith predicted picture, S _i,wall And the area of the outer wall in the ith predicted picture is shown. If more than 80% of the prediction sample errors are less than 10%, the precision is considered to meet the requirement, and the step six is executed; otherwise, returning to the first step and completing the first to fourth steps again.

Step six: establishing a prediction model library;

and (3) reserving the model network architecture and related setting parameters in the third step and the fourth step to form a final prediction model library.

Step seven: obtaining a picture of the elevation of the urban building;

the target building elevation picture can be obtained by shooting, for example: handheld mobile phone/camera shooting, vehicle-mounted or airborne camera shooting; or available through free map sites such as: hundred-degree panoramic pictures, so that the time for collecting the pictures can be greatly saved.

Step eight: predicting the elevation picture of the urban building;

and D, putting the pictures obtained in the step seven into a model library in the step six, wherein the prediction method is the same as the step four, and specifically comprises the steps of outer wall identification, window identification and automatic repair of missing elements.

Step nine: obtaining the window wall ratio of the urban building;

and (3) finally predicting and repairing the area of the obtained outer wall and window according to the step (eight), and calculating the window wall ratio of each target building according to the formula (1).

The invention has the following advantages:

1. the invention has lower implementation cost. The data acquisition of the invention only depends on shooting or free map websites, and model training and picture prediction can be completed by using a common computer, so that the total cost is low;

2. the invention has higher prediction precision. The invention can achieve lower prediction error for most prediction samples, has the characteristic of automatically repairing incomplete windows and walls, and ensures the high precision of the whole prediction process;

3. the invention can save a great deal of time and manpower. According to the invention, an effective prediction model library can be established through labeling and model training of a small amount of samples, so that automatic prediction of the window wall ratio of mass buildings in cities is completed.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of pixel level labeling examples;

FIG. 3 is a schematic diagram of a Unet architecture;

FIG. 4 is a diagram of picture prediction recognition samples (a) a prediction original; (b) exterior wall identification prediction; (c) an outer window recognition prediction.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

The embodiment discloses an automatic recognition method for window wall ratios of urban buildings, and the following further describes specific embodiments according to fig. 1 to 4:

as shown in fig. 1, the automatic identification method for the window wall ratio of the urban building comprises the following steps:

step one: shooting a sample building elevation to obtain a sample picture;

Step two: labeling a sample picture;

as shown in FIG. 2, the present invention employs pixel level labeling; the pixel-level labeling refers to labeling along the outer contour of an object, in the invention, specifically, labeling the contour of an outer wall of a building and the contour of each window in a shot picture one by one, and respectively giving labels to the labeled objects: "wall" or "window". After the labeling is completed, the XML file (Extensible Markup Language ) is exported and generated.

Step three: training a model;

Improvement of the uiet architecture (as shown in fig. 3): the Unet architecture is also called an encoder-decoder structure and consists of three parts, wherein the Unet architecture comprises feature extraction, upsampling and channel; the first half acts as feature extraction (also called "downsampling", encoder section), the second half as upsampling (decoder section), and the middle as channels for gradient flow. The feature extraction has the following effects: the input picture is subjected to downsampling coding to obtain high-resolution information of concentrated original picture information; the purpose of up-sampling is to recover the picture information to obtain the required result; the purpose of the channel is to facilitate the model to better collect global information. The improvement of the Unet architecture is as follows: (1) Increasing the sampling layer number and adding bn layers, and improving the accuracy of feature extraction; (2) The decoder calls the pre-trained ResNet model parameters in the ILSVRC match, so that the decoding precision can be improved, and the model training time can be greatly reduced.

Step four: sample picture prediction (as shown in fig. 4);

Step five: model accuracy verification;

η _i ＝S _i，window /S _i，wall (1)

Step six: establishing a prediction model library;

Step seven: obtaining a picture of the elevation of the urban building;

Step eight: predicting the elevation picture of the urban building;

putting the pictures obtained in the step seven into a model library in the step six, wherein the prediction method is the same as the step four, and specifically comprises outer wall prediction, window prediction and automatic repair of missing elements.

Step nine: obtaining the window wall ratio of the urban building;

The method comprises the following steps:

1. and obtaining a sample picture. The researcher holds a mobile phone (with a photographing function) to photograph 85 training pictures meeting the first requirement in the areas of the Shaozhou Shore mountain and the areas of the Nanjing city. In addition, according to government data, 43 pictures with actual window wall ratios are taken for prediction;

2. and (5) labeling training pictures. The windows and the outer walls in the training pictures are labeled one by one at the pixel level by using a free software 'eidolon labeling assistant' (as shown in figure 2). The marked windows are uniformly provided with labels 'outer windows', and the marked outer walls are provided with labels 'outer walls'. After the labeling is completed, an XML file is derived and generated;

3. and (5) picture identification training. The Unet architecture set forth in step three (see FIG. 3) was constructed using the python language and the relevant modifications were made. Importing an XML file to perform training learning, and performing recognition training of an outer wall by adding related sentences for executing resolution (scaling), so as to learn the characteristics of positions and colors of the XML file; through adding relevant sentences for executing Split (segmentation), recognition training of windows is carried out, and the characteristics of positions and colors of the windows are learned;

4. sample picture prediction. The position and color characteristics of the learned outer wall and window are input by using the Unet improved architecture built by python language, related sentences for executing resolution (scaling) and Split (segmentation) are added, and the outer wall and window in a prediction sample are respectively predicted, so that the result is shown in fig. 4. And after the prediction is finished, adding related sentences for executing the CRF algorithm to repair the contours of the outer wall and the window. Finally, automatically counting the areas of the outer wall and the window in each picture by a computer, and calculating the window wall ratio by the formula (1):

η _i ＝S _i，window /S _i，wall (1)

wherein i represents the i-th predicted picture, S _i,window Representing the total area of windows in the ith predicted picture, S _i,wall Representing the area of the outer wall in the ith predicted picture;

5. and (5) error checking. Counting the prediction error epsilon of each picture one by one through a method (2) _i ：

ε _i ＝|η _i ′-η _i |/η _i (2)

Wherein eta _i ，η _i ' represents the actual to predicted window wall ratio of the i-th predicted picture, respectively.

Finally, 31 of the 43 predicted pictures satisfy the error less than 6%, and 36 satisfy the error less than 10%. The identification error of 83.7% of the predicted samples can be less than 10%, and the precision requirement is met. At this time, a prediction model library may be established according to the model network architecture and the related setting parameters.

6. And (5) predicting the window wall ratio of the urban building. The urban building elevation picture is obtained, and the target building elevation picture can be obtained by shooting, such as: handheld mobile phone/camera shooting, vehicle-mounted or airborne camera shooting; or available through free map sites such as: hundred-degree panoramic pictures, so that the time for collecting the pictures can be greatly saved. And obtaining a building elevation photo of the target area, inputting the building elevation photo into a prediction model library for automatic identification and repair, and obtaining the window wall ratio of each target building. Since the invention focuses on method elucidation and the prediction model library is already established, implementation to the whole city is only one physical activity, and is not described here.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. An automatic recognition method for the window wall ratio of a city building is characterized by comprising the following steps: the method comprises the following steps:

step one: shooting a sample building elevation to obtain a sample picture;

step two: labeling a sample picture;

step three: model construction and training;

step four: testing picture prediction;

step five: model accuracy verification;

step six: establishing a prediction model library;

step seven: obtaining a target building elevation picture;

step eight: automatically identifying the window wall ratio of the target building;

the requirements of the sample building elevation shooting in the first step comprise: shooting the elevation of the target at a front view angle, and keeping the target from looking up and tilting; the vertical shooting is free of shielding; shooting the building elevation in a short distance;

the sample picture marking in the second step adopts pixel level marking;

the model construction and training in the step three comprises model algorithm selection, unet improvement, outer wall recognition training and window recognition training, wherein the model is constructed through the improved Unet; the basic architecture of the Unet includes: feature extraction, up-sampling and channel three parts; the Unet is improved: in the feature extraction part, the number of sampling layers is increased, and bn layers are added; an up-sampling part, calling a pre-trained ResNet model parameter in an ILSVRC match; sample pictures are used for model training, the sample pictures in the outer wall recognition training are put into the model for training in a zooming mode, and the sample pictures in the window recognition training are divided into small blocks in a window sweeping mode and put into the model for training;

the test picture prediction in the fourth step comprises the test picture shooting requirement, the outer wall prediction, the window prediction and the automatic repair of missing elements; in the requirement of shooting the test picture, the test picture allows a certain inclination and tree shielding problem of a building in the picture; in the outer wall prediction, compressing a test picture to the size specified by a training picture in a scaling mode, putting the test picture into a model for prediction, and restoring the test picture to the original size after obtaining a result; in the window prediction, a test picture is segmented in a window sweeping mode, and the segmentation size is consistent with the size specified by a training picture; all the divided pictures are put into a model for prediction, and the pictures are spliced to the original size after the result is obtained; in the automatic repair of the missing elements, the outer wall and part of the windows are incomplete due to the fact that the trees have certain shielding to part of the buildings, and the incomplete outer wall and part of the windows are required to be subjected to element repair; repairing by adopting a CRF algorithm, wherein the CRF algorithm corrects the test image by combining the shape and color information of the original image;

the model accuracy in the step five is verified, test pictures are input into the model, the pixel areas of the outer wall and the window on each test picture are automatically counted by a computer, and the window wall ratio eta is calculated by the formula (1) _i ：

η _i ＝S _i,window /S _i,wall (1)

Wherein S is _i,window Representing the total pixel area of the window in the ith picture, S _i,wall Representing the total pixel area of the outer wall in the ith picture;

counting the prediction error epsilon of each test picture one by one through a method (2) _i ：

ε _i ＝|η _i ′-η _i |/η _i (2)

Wherein eta _i ，η _i ' respectively represents the ratio of the prediction of the ith test picture to the actual window wall;

if more than 80% of the test picture errors are less than 10%, the model precision is considered to meet the requirement, and the step six is executed; otherwise, returning to the first step and completing the first to fourth steps again;

the prediction model library in the step six is built according to the model architecture and parameters determined in the steps three to five;

the target building elevation picture in the step seven is obtained through a shooting mode or through a map website;

and step eight, automatically identifying the window wall ratio of the target building, and placing the vertical face picture of the target building into a prediction model library for prediction to obtain the window wall ratio of the target building.