[go: up one dir, main page]

CN117743931B - Soundscape prediction model training method and prediction method integrating multi-source urban data - Google Patents

Soundscape prediction model training method and prediction method integrating multi-source urban data Download PDF

Info

Publication number
CN117743931B
CN117743931B CN202311763103.9A CN202311763103A CN117743931B CN 117743931 B CN117743931 B CN 117743931B CN 202311763103 A CN202311763103 A CN 202311763103A CN 117743931 B CN117743931 B CN 117743931B
Authority
CN
China
Prior art keywords
training
soundscape
sound
features
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311763103.9A
Other languages
Chinese (zh)
Other versions
CN117743931A (en
Inventor
涂伟
蔡钊悦
叶垚森
陈思琦
陈睿哲
余俊娴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202311763103.9A priority Critical patent/CN117743931B/en
Publication of CN117743931A publication Critical patent/CN117743931A/en
Application granted granted Critical
Publication of CN117743931B publication Critical patent/CN117743931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及声景预测技术领域,具体是涉及融合多源城市数据的声景预测模型训练方法和预测方法。本发明对环境数据进行统计分析,以得到与声景相关的特征,也就是得到声景训练特征,基于声景训练特征训练神经网络模型,以得到声景预测模型。采集需要预测声景的待预测地理区域内的环境数据,并对该环境数据应用声景预测模型,预测模型输出该待预测地理区域的声景信息。本发明的环境数据是已有的数据,而不需要去现场采集,也就是只要采用已有的环境数据即可预测声景信息,从而扩大了本发明声景预测方法的应用场景。而且本发明采用声景预测模型预测声景信息,能够全面准确的预测声景。

The present invention relates to the field of soundscape prediction technology, and specifically to a soundscape prediction model training method and a prediction method that integrate multi-source urban data. The present invention performs statistical analysis on environmental data to obtain features related to the soundscape, that is, to obtain soundscape training features, and trains a neural network model based on the soundscape training features to obtain a soundscape prediction model. Environmental data in a geographic area to be predicted where soundscape prediction is required is collected, and a soundscape prediction model is applied to the environmental data, and the prediction model outputs the soundscape information of the geographic area to be predicted. The environmental data of the present invention is existing data, and there is no need to collect it on site, that is, the soundscape information can be predicted by using existing environmental data, thereby expanding the application scenario of the soundscape prediction method of the present invention. Moreover, the present invention uses a soundscape prediction model to predict soundscape information, and can comprehensively and accurately predict the soundscape.

Description

Sound scene prediction model training method and prediction method integrating multi-source city data
Technical Field
The invention relates to the technical field of sound scene prediction, in particular to a sound scene prediction model training method and a sound scene prediction method for fusing multisource city data.
Background
The sound scene is a sound environment perceived, experienced or understood by individuals or groups, covers all sounds in the environment, accurately knows the sound scene in the environment, and can provide technical support for improving urban sound scene quality and protecting resident health. The prior art often uses acoustic measurement instruments to capture the sound scenery in the environment, and acoustic measurement instruments have geographical limitations in that they cannot be placed in all geographical areas to measure the sound scenery in that area.
In summary, the prior art has limitations in measuring sound scenes.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
In order to solve the technical problems, the invention provides a sound scene prediction model training method and a sound scene prediction method which are integrated with multi-source city data, and solves the problem that the prior art has limitation in measuring sound scenes.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
In a first aspect, the present invention provides a method for training a sound scene prediction model fused with multi-source city data, including:
acquiring stored environmental data in a training geographic area, and carrying out statistical analysis on the environmental data to obtain sound and scene training characteristics of the training geographic area;
Applying a neural network model to the sound scene training characteristics to obtain sound scene training labels which are output by the neural network model and are aimed at the training geographic area;
obtaining a sound scene actual measurement label of the training geographic area, calculating a loss function of the neural network model according to the sound scene actual measurement label and the sound scene training label, and training the neural network model according to the loss function to obtain a sound scene prediction model.
In one implementation, the acquiring the saved environmental data in the training geographic area and performing statistical analysis on the environmental data to obtain the sound scene training feature of the training geographic area includes:
Acquiring interest point data information, road data, building data, street view images and greening data in the stored environment data in the training geographic area;
carrying out statistical analysis on the interest point data information, the road data, the building data, the street view image and the greening data to obtain the number of interest points, the interest point density, the road density, the shortest distance between a central position and a road, the building density, the building volume rate, the green land coverage rate, the nearest green land distance, the nearest green land area and the street view image element ratio in the training geographic area;
And taking the number of the interest points, the interest point density, the road density, the shortest distance, the building density, the building volume rate, the greenbelt coverage rate, the nearest greenbelt distance, the nearest greenbelt area and the street view image each element ratio as the sound view training characteristics of the training geographic area.
In one implementation, the neural network model includes a feature screening layer, a shared hidden layer, a sound intensity hidden layer, and a sound source hidden layer, where the feature screening layer is cascaded with the shared hidden layer, the sound intensity hidden layer, and the sound source hidden layer respectively, the shared hidden layer is cascaded with the sound intensity hidden layer, and the sound source hidden layer, and the applying the neural network model to the sound scene training feature, to obtain a sound scene training tag output by the neural network model and aimed at the training geographic area, includes:
inputting the sound scene training features into a feature screening layer to obtain sharing features, sound intensity features and sound source features screened by the feature screening layer from the sound scene training features, wherein the sharing features are sound scene features related to sound intensity and sound source, the sound intensity features are sound scene features related to sound intensity only, and the sound source features are sound scene features related to sound source only;
After the sharing feature is input to the sharing hidden layer, the sound intensity feature is input to the sound intensity hidden layer, the sound source feature is input to the sound source hidden layer, a sound intensity training label output by the sound intensity hidden layer and a sound source training label output by the sound source hidden layer are obtained, and the sound intensity training label and the sound source training label are used as sound scene training labels.
In one implementation, the obtaining the actually measured sound scene tag of the training geographic area, calculating a loss function of the neural network model according to the actually measured sound scene tag and the actually measured sound scene tag, and training the neural network model according to the loss function to obtain a predicted sound scene model includes:
Acquiring a sound intensity actual measurement tag and a sound source actual measurement tag in the sound scene actual measurement tag;
calculating a sound intensity loss function according to the difference between the sound intensity training label and the sound intensity actually-measured label;
calculating a sound source loss function according to the sound source training label and the sound source actual measurement label;
And adjusting parameters of the neural network model according to the sound intensity loss function and the sound source loss function until the sound intensity loss function and the sound source loss function are smaller than set values so as to obtain a sound scene prediction model.
In one implementation, the calculating the sound intensity loss function according to the difference between the sound intensity training tag and the sound intensity actually measured tag includes:
And calculating the sound intensity mean square error formed by the sound intensity training label and the sound intensity actually-measured label, and taking the sound intensity mean square error as a sound intensity loss function.
In one implementation, the calculating the sound source loss function according to the sound source training tag and the sound source actual measurement tag includes:
Calculating each logarithmic function of each sound source training label;
Multiplying each logarithmic function by each corresponding sound source actual measurement label to obtain each intermediate result;
and carrying out weighted calculation on each intermediate result to obtain a sound source loss function.
In a second aspect, an embodiment of the present invention further provides a sound scene prediction method, where the sound scene prediction model is applied, and the sound scene prediction method includes:
Acquiring environment data in a geographic area to be detected, and carrying out statistical analysis on the environment data to obtain sound scene characteristics in the geographic area to be detected;
and applying a sound scene prediction model to the sound scene characteristics to obtain a sound source prediction label and a sound intensity prediction label which are output by the sound scene prediction model.
In a third aspect, an embodiment of the present invention further provides a training device for a sound scene prediction model fused with multi-source city data, where the training device includes the following components:
The feature statistics module is used for acquiring the saved environment data in the training geographic area and carrying out statistical analysis on the environment data to obtain sound scene training features of the training geographic area;
the prediction module is used for applying a neural network model to the sound scene training characteristics to obtain sound scene training labels which are output by the neural network model and are aimed at the training geographic area;
the training module is used for acquiring the sound scene actual measurement label of the training geographic area, calculating a loss function of the neural network model according to the sound scene actual measurement label and the sound scene training label, and training the neural network model according to the loss function so as to obtain a sound scene prediction model.
In a fourth aspect, an embodiment of the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and a scenario prediction model training program that is stored in the memory and can be run on the processor and that fuses multi-source city data, and when the processor executes the scenario prediction model training program that fuses multi-source city data, the steps of the scenario prediction model training method that fuses multi-source city data are implemented.
In a fifth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a sound scene prediction model training program for fusing multi-source city data is stored on the computer readable storage medium, where the sound scene prediction model training program for fusing multi-source city data is executed by a processor, to implement the steps of the sound scene prediction model training method for fusing multi-source city data.
The beneficial effects are that: according to the invention, the environmental data is subjected to statistical analysis to obtain characteristics related to the sound scene, namely, sound scene training characteristics, and the neural network model is trained based on the sound scene training characteristics so as to obtain a sound scene prediction model. And acquiring environment data in the geographic area to be predicted, which needs to be predicted, and applying a sound scene prediction model to the environment data, wherein the prediction model outputs sound scene information of the geographic area to be predicted. The environment data of the invention is the existing data, and the scene information can be predicted without on-site acquisition, namely, only the existing environment data is adopted, thereby expanding the application scene of the scene prediction method. In addition, the invention adopts the sound scene prediction model to predict the sound scene information, so that the sound scene can be comprehensively and accurately predicted.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a schematic diagram of model training in an embodiment of the present invention;
FIG. 3 is a simulation diagram of sound intensity prediction in an embodiment of the present invention;
FIG. 4 is a diagram of a training device for a sound scene prediction model, which is provided by the invention and is fused with multi-source city data;
fig. 5 is a schematic block diagram of an internal structure of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is clearly and completely described below with reference to the examples and the drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The research shows that the sound scene is a sound environment perceived, experienced or understood by individuals or groups, covers all sounds in the environment, accurately knows the sound scene in the environment, and can provide technical support for improving urban sound scene quality and protecting resident health. The prior art often uses acoustic measurement instruments to capture the sound scenery in the environment, and acoustic measurement instruments have geographical limitations in that they cannot be placed in all geographical areas to measure the sound scenery in that area.
In order to solve the technical problems, the invention provides a sound scene prediction model training method and a sound scene prediction method which are integrated with multi-source city data, and solves the problem that the prior art has limitation in measuring sound scenes. In the specific implementation, firstly, the saved environment data in a training geographic area are acquired, and the environment data are subjected to statistical analysis to obtain sound and scene training characteristics of the training geographic area; then, applying a neural network model to the sound scene training characteristics to obtain a sound scene training label which is output by the neural network model and aims at the training geographic area; finally, obtaining a sound scene actual measurement label of the training geographic area, calculating a loss function of the neural network model according to the sound scene actual measurement label and the sound scene training label, and training the neural network model according to the loss function to obtain a sound scene prediction model.
The method for training the sound scene prediction model fused with the multi-source city data can be applied to terminal equipment, and the terminal equipment can be a terminal product with a data acquisition function, such as a computer and the like. In this embodiment, as shown in fig. 1, the method for training the sound scene prediction model fusing the multi-source city data specifically includes the following steps:
s100, acquiring stored environment data in a training geographic area, and carrying out statistical analysis on the environment data to obtain sound scene training characteristics of the training geographic area.
And S200, applying a neural network model to the sound scene training characteristics to obtain sound scene training labels which are output by the neural network model and are aimed at the training geographic area.
S300, obtaining a sound scene actual measurement tag of the training geographic area, calculating a loss function of the neural network model according to the sound scene actual measurement tag and the sound scene training tag, and training the neural network model according to the loss function to obtain a sound scene prediction model.
In one embodiment, step S100 includes the following specific steps S101 and S102:
s101, acquiring interest point data information, road data, building data, street view images and greening data in the stored environment data in the training geographic area.
The point of interest data information, road data, building data, street view image, and greening data in fig. 2 constitute multi-source city data.
Acquiring point of interest data information through a Goldmap open platform and a hundred-degree map open platform, wherein the point of interest data information comprises point of interest POI data and interest surface AOI data in fig. 2, the point of interest POI comprises a mall, a school and a station, and the interest surface AOI data is used for representing the density or the area of the point of interest POI.
And downloading and acquiring the OSM road and building data through an Open STREET MAP website. And obtaining street view images on the hundred-degree map panoramic platform. The urban green space area is extracted by classifying the sentinel second high-resolution remote sensing image.
And S102, carrying out statistical analysis on the interest point data information, the road data, the building data, the street view image and the greening data to obtain the number of interest points, the interest point density, the road density, the shortest distance between a central position and a road, the building density, the building volume rate, the green land coverage rate, the nearest green land distance, the nearest green land area and the street view image ratio in the training geographic area. And taking the number of the interest points, the interest point density, the road density, the shortest distance, the building density, the building volume rate, the greenbelt coverage rate, the nearest greenbelt distance, the nearest greenbelt area and the street view image each element ratio as the sound view training characteristics of the training geographic area.
Number of points of interest
P i is the number of ith points of interest in the grid that trains the geographic area, e.g., P i is the number of schools or businesses in the training geographic area.
Point of interest Density
A i is the area of the ith point of interest in the training geographic area, and S is the total area of the training geographic area.
Road density D R:
L is the total length of the link within the training geographic area.
Shortest distance
R i,j is the j-th road of the i-th level, dist (c, r i,j) is the distance from the center point c to all r i,j in the training geographic area.
Building density D B:
n is the total number of buildings in the training geographic area and a b,i is the base area of the ith building.
Building volume ratio FAR:
n is the total number of buildings in the training geographic area and a f,i is the surface area of the ith building.
Green land coverage FVC:
NVDI soil is NVDI value of the pure bare soil covering pixel, and NVDI veg is NVDI value of the pure vegetation covering pixel.
Nearest green space distance Dist G:
g i is the ith greenbelt outside the training geographic area, dist (c, g i) is the distance between the center point c and g i within the training geographic area.
Recent greenfield area a g:
Is the greenbelt area corresponding to the nearest greenbelt distance Dist G from the center point c in the training geographic area.
Street view image each element ratio
The area value of the ith type of pixel, that is, the area occupied by the ith type of pixel on the street view image, T n is the total area of the street view image, that is, T n is the sum of all pixels of the street view image.
In one embodiment, the neural network model in S200 is shown in fig. 2, and includes a feature screening layer, a shared hiding layer, a sound intensity hiding layer, and a sound source hiding layer, where the feature screening layer is cascaded with the shared hiding layer, the sound intensity hiding layer, and the sound source hiding layer, and the shared hiding layer is cascaded with the sound intensity hiding layer and the sound source hiding layer, respectively. Wherein the feature screening layer is the feature engineering layer in fig. 2. Step S200 includes the following specific steps S201 and S202:
S201, inputting the sound scene training features into a feature screening layer to obtain sharing features, sound intensity features and sound source features screened by the feature screening layer from the sound scene training features, wherein the sharing features are sound scene features related to sound intensity and sound source, the sound intensity features are sound scene features related to sound intensity only, and the sound source features are sound scene features related to sound source only.
The feature screening layer screens and classifies the sound scene features in the step S102, and divides sound source features, sound intensity features and sharing features. Wherein the sharing feature comprises the road density D R in the step S102, the shortest distance to the secondary roadGreen land coverage FVC, traffic site density, landscape density; the sound intensity characteristics comprise the nearest green area A g and the ratio of each element of street view imageThe sound source characteristics include the shortest distance to the primary roadShortest distance to three-level roadShortest distance to level four roadBuilding density D B, building volume rate FAR, point of interest DensityNumber of points of interest
S202, after the sharing feature is input to the sharing hidden layer, the sound intensity feature is input to the sound intensity hidden layer, the sound source feature is input to the sound source hidden layer, a sound intensity training label output by the sound intensity hidden layer and a sound source training label output by the sound source hidden layer are obtained, and the sound intensity training label and the sound source training label are used as sound scene training labels.
Namely, firstly inputting the sharing feature, and after the sharing feature reaches the sharing hidden layer, respectively inputting the sound source feature and the sound intensity feature into the sound source hidden layer and the sound intensity hidden layer, so that the three features are input into the neural network model in batches.
The three hiding layers of the shared hiding layer, the sound intensity hiding layer and the sound source hiding layer are all composed of convolution layers. The shared hidden layer receives the characteristics subjected to characteristic selection screening, and for the characteristics required by the two tasks of sound source prediction and sound intensity prediction, the shared hidden layer (full connection layer) is designed to extract the characteristics of the similar parts of the two tasks. The sound intensity hiding layer and the sound source hiding layer belong to special feature layers, the special feature layers adopt ways of respectively designing two hiding layers (full-connection layers) after sharing the hiding layers for the special feature training of the two tasks so as to realize the learning of the feature of the two tasks which are different.
Because the training sample size is smaller, and meanwhile, the risk of overfitting is easily caused by a plurality of fully-connected layers, one Drop-out layer is respectively added behind the sound source hiding layer and the sound intensity hiding layer so as to enhance the generalization of the model, a softmax activation function is arranged on an output layer cascaded with the sound source hiding layer, and a linear activation function is arranged on an output layer cascaded with the sound intensity hiding layer.
The sound intensity training label output by the neural network model is shown in fig. 3, that is, the neural network model predicts the sound intensity, and the sound intensity is used as the sound intensity training label. Meanwhile, the neural network model also outputs a sound source training label, namely, the sound source category is represented by a digital label, the sound source category is shown in the table 1,
TABLE 1
Sound source class Detailed category
Natural sound Bird song, wave song, insect song, wind leaf song
Human voice Talking sound, moving sound, broadcasting sound, and sounding sound
Traffic sound Sound and whistling for motor vehicle running
Mechanical sound Construction sound, cargo handling sound
In one embodiment, step S300 includes the following specific steps S301 to S306:
s301, acquiring a sound intensity actual measurement tag and a sound source actual measurement tag in the sound scene actual measurement tag.
The sound intensity actual measurement tag and the sound source actual measurement tag are the sound intensity magnitude and sound source category of the training geographic area, wherein the sound intensity magnitude and the sound source category are acquired in the following manner:
and (3) directing the sound level meter to a sound source, recording longitude and latitude of a sampling point, the number of the sampling point and the place where the sampling point is located, and recording and synchronously presenting data by means of an application program on mobile equipment where the sound level meter is located.
The actual measurement tag of the sound intensity is the average value of the sound intensity of each sampling point.
S302, calculating the sound intensity training labelAnd the sound intensity mean square error RMSE formed by the sound intensity actually measured label y i is used as a sound intensity loss function.
Where n is the total number of sampling points.
S303, calculating each log function log (p i) of each sound source training label p i.
S304, multiplying each logarithmic function by each corresponding sound source actual measurement label y i to obtain each intermediate result y ilog(pi).
S305, weighting calculation is carried out on each intermediate result to obtain a sound source loss function H (y, p):
S306, adjusting parameters of the neural network model according to the sound intensity loss function and the sound source loss function until the sound intensity loss function and the sound source loss function are smaller than set values, so as to obtain a sound scene prediction model.
In model training, the loss functions of the two tasks of sound intensity prediction and sound source prediction are separated, the minimum values of the two tasks are respectively estimated by using back propagation, and when the two loss functions obtain the minimum values, the model training is completed.
In one embodiment, the model training is completed, and the acoustic scene prediction model obtained by the training is evaluated, specifically, the model sound source prediction task is evaluated by adopting the functions of accuracy Acc, accuracy PREC, recall rate REC and F1, and the model sound intensity prediction task is evaluated by adopting the function of R 2.
In the formula, TP is the quantity of the consistent sound source prediction labels and sound source actual measurement labels output by the model, FP is the quantity of the sound source prediction labels which are not equal to the quantity of the sound source actual measurement labels, FN is the quantity of the sound source prediction labels which are output by the model and are equal to the quantity of the non-sound source actual measurement labels, TN is the quantity of the non-sound source prediction labels which are output by the model and are equal to the quantity of the sound source actual measurement labels.
In the formula,The average value of the tag is measured by sound intensity.
The accuracy Acc, accuracy PREC, recall REC, F1 and R 2 functions and RMSE functions are used to construct a composite index epsilon:
ε=w1·ACCnorm+w2·PECnorm+w2·RECnorm+w3·F1norm+w4·RMSEnorm+w5·R2 norm
In the formula, ACC norm is the normalization of ACC, PEC norm is the normalization of accuracy PREC, REC norm is the normalization of REC, F1 norm is the normalization of F1, RMSE norm is the normalization of RMSE, R 2 norm is the normalization of R 2.
The normalized formula is as follows:
X is any one of the six indexes, X min is the minimum value of any one of the indexes, and X max is the maximum value of any one of the indexes.
By balancing the contributions of the different indices, the overall performance index ε provides a more comprehensive performance assessment in the sound scene prediction.
In conclusion, the sound scene prediction module constructed by the invention can simultaneously process sound source classification and sound intensity prediction tasks, namely, simultaneously predict sound source classification and sound intensity. The model is used for predicting the urban sound scene of the unknown area, so that the comprehensive and accurate prediction of the urban sound scene is realized.
The method and the system fully utilize the information of different dimensions of the city, including the data of buildings, road networks, streetscapes and the like, and make up the defect of a single data source, thereby capturing the characteristic information of the urban sound scene more comprehensively and carrying out more efficient and accurate urban sound scene prediction.
The invention establishes the shared characteristic layer and the special characteristic layer by utilizing the characteristic relation between the sound source category and the sound intensity, thereby realizing the task of simultaneously predicting the sound source category and the sound intensity. After training by the small sample, the model can predict a large-range urban sound scene only by the urban sound scene characteristics under the condition of no need of sound data, and the comprehensive efficiency of sound scene prediction is improved.
The embodiment also provides a training device for the sound scene prediction model fused with the multi-source city data, as shown in fig. 4, the training device comprises the following components:
the feature statistics module 01 is used for acquiring the saved environment data in the training geographic area and carrying out statistical analysis on the environment data to obtain sound scene training features of the training geographic area;
The prediction module 02 is used for applying a neural network model to the sound scene training characteristics to obtain a sound scene training label which is output by the neural network model and aims at the training geographic area;
The training module 03 is configured to obtain a sound scene actual measurement tag of the training geographic area, calculate a loss function of the neural network model according to the sound scene actual measurement tag and the sound scene training tag, and train the neural network model according to the loss function to obtain a sound scene prediction model.
Based on the above embodiment, the present invention also provides a terminal device, and a functional block diagram thereof may be shown in fig. 5. The terminal equipment comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein the processor of the terminal device is adapted to provide computing and control capabilities. The memory of the terminal device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the terminal device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of training a sound scene prediction model that incorporates multi-source city data. The display screen of the terminal device may be a liquid crystal display screen or an electronic ink display screen.
It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 5 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the terminal device to which the present inventive arrangements are applied, and that a particular terminal device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a terminal device is provided, where the terminal device includes a memory, a processor, and a scenario prediction model training program that is stored in the memory and can be run on the processor and that fuses multi-source city data, and when the processor executes the scenario prediction model training program that fuses multi-source city data, the following operation instructions are implemented:
acquiring stored environmental data in a training geographic area, and carrying out statistical analysis on the environmental data to obtain sound and scene training characteristics of the training geographic area;
Applying a neural network model to the sound scene training characteristics to obtain sound scene training labels which are output by the neural network model and are aimed at the training geographic area;
obtaining a sound scene actual measurement label of the training geographic area, calculating a loss function of the neural network model according to the sound scene actual measurement label and the sound scene training label, and training the neural network model according to the loss function to obtain a sound scene prediction model.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1.一种融合多源城市数据的声景预测模型训练方法,其特征在于,包括:1. A soundscape prediction model training method integrating multi-source urban data, characterized by comprising: 获取训练地理区域内已保存的环境数据,并对所述环境数据进行统计分析,以得到所述训练地理区域的声景训练特征;Acquiring environmental data stored in a training geographical area, and performing statistical analysis on the environmental data to obtain soundscape training features of the training geographical area; 对所述声景训练特征应用神经网络模型,得到所述神经网络模型输出的针对所述训练地理区域的声景训练标签;Applying a neural network model to the soundscape training features to obtain a soundscape training label for the training geographic area output by the neural network model; 获取所述训练地理区域的声景实测标签,依据所述声景实测标签和所述声景训练标签,计算所述神经网络模型的损失函数,并依据所述损失函数,训练所述神经网络模型,以得到声景预测模型,所述声景预测模型同时预测声源类别和声强大小;Acquire the measured soundscape labels of the training geographical area, calculate the loss function of the neural network model based on the measured soundscape labels and the soundscape training labels, and train the neural network model based on the loss function to obtain a soundscape prediction model, wherein the soundscape prediction model simultaneously predicts the sound source category and the sound intensity; 所述获取训练地理区域内已保存的环境数据,并对所述环境数据进行统计分析,以得到所述训练地理区域的声景训练特征,包括:The step of acquiring the environmental data stored in the training geographical area and performing statistical analysis on the environmental data to obtain the soundscape training features of the training geographical area includes: 获取训练地理区域内已保存的环境数据中的兴趣点数据信息、道路数据、建筑物数据、街景图像和绿化数据;Obtaining point of interest data information, road data, building data, street view images and green data from the environmental data stored in the training geographic area; 对所述兴趣点数据信息、所述道路数据、所述建筑物数据、所述街景图像和所述绿化数据进行统计分析,以得到所述训练地理区域内的兴趣点数量、兴趣点密度、道路密度、中心位置与道路之间的最短距离、建筑物密度、建筑容积率、绿地覆盖率、最近绿地距离、最近绿地面积、街景影像各要素占比;Performing statistical analysis on the interest point data information, the road data, the building data, the street view image and the green data to obtain the number of interest points, interest point density, road density, the shortest distance between the center position and the road, building density, building volume ratio, green space coverage, the nearest green space distance, the nearest green space area, and the proportion of each element of the street view image in the training geographical area; 将所述兴趣点数量、所述兴趣点密度、所述道路密度、所述最短距离、所述建筑物密度、所述建筑容积率、所述绿地覆盖率、所述最近绿地距离、所述最近绿地面积、所述街景影像各要素占比作为所述训练地理区域的声景训练特征;The number of points of interest, the density of points of interest, the road density, the shortest distance, the building density, the building volume ratio, the green space coverage ratio, the nearest green space distance, the nearest green space area, and the proportion of each element of the street view image are used as the soundscape training features of the training geographical area; 所述神经网络模型包括特征筛选层、共享隐藏层、声强隐藏层、声源隐藏层,其中,所述特征筛选层分别与所述共享隐藏层、所述声强隐藏层、所述声源隐藏层级联,所述共享隐藏层分别与所述声强隐藏层、所述声源隐藏层级联,所述对所述声景训练特征应用神经网络模型,得到所述神经网络模型输出的针对所述训练地理区域的声景训练标签,包括:The neural network model includes a feature screening layer, a shared hidden layer, a sound intensity hidden layer, and a sound source hidden layer, wherein the feature screening layer is cascaded with the shared hidden layer, the sound intensity hidden layer, and the sound source hidden layer, respectively, and the shared hidden layer is cascaded with the sound intensity hidden layer and the sound source hidden layer, respectively. The neural network model is applied to the soundscape training features to obtain the soundscape training label for the training geographical area output by the neural network model, including: 将所述声景训练特征输入至特征筛选层,得到所述特征筛选层从所述声景训练特征中筛选出的共享特征、声强特征和声源特征,所述共享特征为与声强和声源均相关的声景特征,所述声强特征为只与声强相关的声景特征,所述声源特征为只与声源相关的声景特征;Inputting the soundscape training features into a feature screening layer, obtaining shared features, sound intensity features, and sound source features screened by the feature screening layer from the soundscape training features, wherein the shared features are soundscape features related to both sound intensity and sound source, the sound intensity features are soundscape features only related to sound intensity, and the sound source features are soundscape features only related to sound source; 将所述共享特征输入至所述共享隐藏层之后,将所述声强特征输入至所述声强隐藏层、将所述声源特征输入至所述声源隐藏层,得到所述声强隐藏层输出的声强训练标签、所述声源隐藏层输出的声源训练标签,并将所述声强训练标签和所述声源训练标签作为声景训练标签。After the shared features are input into the shared hidden layer, the sound intensity features are input into the sound intensity hidden layer, and the sound source features are input into the sound source hidden layer, so as to obtain the sound intensity training labels output by the sound intensity hidden layer and the sound source training labels output by the sound source hidden layer, and use the sound intensity training labels and the sound source training labels as soundscape training labels. 2.如权利要求1所述的融合多源城市数据的声景预测模型训练方法,其特征在于,所述获取所述训练地理区域的声景实测标签,依据所述声景实测标签和所述声景训练标签,计算所述神经网络模型的损失函数,并依据所述损失函数,训练所述神经网络模型,以得到声景预测模型,包括:2. The method for training a soundscape prediction model integrating multi-source city data according to claim 1, characterized in that the step of obtaining the measured soundscape labels of the training geographical area, calculating the loss function of the neural network model based on the measured soundscape labels and the soundscape training labels, and training the neural network model based on the loss function to obtain the soundscape prediction model comprises: 获取所述声景实测标签中的声强实测标签和声源实测标签;Obtaining a sound intensity measured label and a sound source measured label from the sound scene measured label; 依据所述声强训练标签和所述声强实测标签的差异性,计算声强损失函数;Calculating a sound intensity loss function according to the difference between the sound intensity training label and the sound intensity measured label; 依据所述声源训练标签和所述声源实测标签,计算声源损失函数;Calculating a sound source loss function based on the sound source training label and the sound source measured label; 依据所述声强损失函数和所述声源损失函数,调整所述神经网络模型的参数,直至所述声强损失函数和所述声源损失函数均小于设定值,以得到声景预测模型。According to the sound intensity loss function and the sound source loss function, the parameters of the neural network model are adjusted until the sound intensity loss function and the sound source loss function are both less than set values, so as to obtain a soundscape prediction model. 3.如权利要求2所述的融合多源城市数据的声景预测模型训练方法,其特征在于,所述依据所述声强训练标签和所述声强实测标签的差异性,计算声强损失函数,包括:3. The soundscape prediction model training method for integrating multi-source urban data according to claim 2, characterized in that the sound intensity loss function is calculated based on the difference between the sound intensity training label and the sound intensity measured label, comprising: 计算所述声强训练标签和所述声强实测标签构成的声强均方差,并将所述声强均方差作为声强损失函数。The sound intensity mean square error formed by the sound intensity training label and the sound intensity measured label is calculated, and the sound intensity mean square error is used as the sound intensity loss function. 4.如权利要求2所述的融合多源城市数据的声景预测模型训练方法,其特征在于,所述依据所述声源训练标签和所述声源实测标签,计算声源损失函数,包括:4. The soundscape prediction model training method for integrating multi-source urban data according to claim 2, characterized in that the sound source loss function is calculated based on the sound source training label and the sound source measured label, comprising: 计算各个所述声源训练标签的各个对数函数;Calculating each logarithmic function of each of the sound source training labels; 将各个所述对数函数乘以对应的各个声源实测标签,得到各个中间结果;Multiplying each of the logarithmic functions by each corresponding sound source measured label to obtain each intermediate result; 对各个所述中间结果进行加权计算,得到声源损失函数。A weighted calculation is performed on each of the intermediate results to obtain a sound source loss function. 5.一种声景预测方法,应用如权利要求1-4任一项所述的声景预测模型,其特征在于,所述声景预测方法,包括:5. A soundscape prediction method, using the soundscape prediction model according to any one of claims 1 to 4, characterized in that the soundscape prediction method comprises: 获取待测地理区域内的环境数据,并对所述环境数据进行统计分析,以得到所述待测地理区域内的声景特征;Acquiring environmental data in a geographic area to be measured, and performing statistical analysis on the environmental data to obtain soundscape characteristics in the geographic area to be measured; 对所述声景特征应用声景预测模型,得到所述声景预测模型输出的声源预测标签和声强预测标签。A soundscape prediction model is applied to the soundscape feature to obtain a sound source prediction label and a sound intensity prediction label output by the soundscape prediction model. 6.一种融合多源城市数据的声景预测模型训练装置,其特征在于,所述训练装置包括如下组成部分:6. A soundscape prediction model training device integrating multi-source urban data, characterized in that the training device comprises the following components: 特征统计模块,用于获取训练地理区域内已保存的环境数据,并对所述环境数据进行统计分析,以得到所述训练地理区域的声景训练特征;A feature statistics module is used to obtain the environmental data stored in the training geographical area and perform statistical analysis on the environmental data to obtain the soundscape training features of the training geographical area; 预测模块,用于对所述声景训练特征应用神经网络模型,得到所述神经网络模型输出的针对所述训练地理区域的声景训练标签;A prediction module, configured to apply a neural network model to the soundscape training features to obtain a soundscape training label for the training geographical area output by the neural network model; 训练模块,用于获取所述训练地理区域的声景实测标签,依据所述声景实测标签和所述声景训练标签,计算所述神经网络模型的损失函数,并依据所述损失函数,训练所述神经网络模型,以得到声景预测模型,所述声景预测模型同时预测声源类别和声强大小;A training module is used to obtain the measured soundscape labels of the training geographical area, calculate the loss function of the neural network model according to the measured soundscape labels and the soundscape training labels, and train the neural network model according to the loss function to obtain a soundscape prediction model, wherein the soundscape prediction model simultaneously predicts the sound source category and the sound intensity; 所述获取训练地理区域内已保存的环境数据,并对所述环境数据进行统计分析,以得到所述训练地理区域的声景训练特征,包括:The step of acquiring the environmental data stored in the training geographical area and performing statistical analysis on the environmental data to obtain the soundscape training features of the training geographical area includes: 获取训练地理区域内已保存的环境数据中的兴趣点数据信息、道路数据、建筑物数据、街景图像和绿化数据;Obtaining point of interest data information, road data, building data, street view images and green data from the environmental data stored in the training geographic area; 对所述兴趣点数据信息、所述道路数据、所述建筑物数据、所述街景图像和所述绿化数据进行统计分析,以得到所述训练地理区域内的兴趣点数量、兴趣点密度、道路密度、中心位置与道路之间的最短距离、建筑物密度、建筑容积率、绿地覆盖率、最近绿地距离、最近绿地面积、街景影像各要素占比;Performing statistical analysis on the interest point data information, the road data, the building data, the street view image and the greening data to obtain the number of interest points, interest point density, road density, the shortest distance between the center position and the road, building density, building volume ratio, green space coverage, the nearest green space distance, the nearest green space area, and the proportion of each element of the street view image in the training geographical area; 将所述兴趣点数量、所述兴趣点密度、所述道路密度、所述最短距离、所述建筑物密度、所述建筑容积率、所述绿地覆盖率、所述最近绿地距离、所述最近绿地面积、所述街景影像各要素占比作为所述训练地理区域的声景训练特征;The number of points of interest, the density of points of interest, the road density, the shortest distance, the building density, the building volume ratio, the green space coverage ratio, the nearest green space distance, the nearest green space area, and the proportion of each element of the street view image are used as the soundscape training features of the training geographical area; 所述神经网络模型包括特征筛选层、共享隐藏层、声强隐藏层、声源隐藏层,其中,所述特征筛选层分别与所述共享隐藏层、所述声强隐藏层、所述声源隐藏层级联,所述共享隐藏层分别与所述声强隐藏层、所述声源隐藏层级联,所述对所述声景训练特征应用神经网络模型,得到所述神经网络模型输出的针对所述训练地理区域的声景训练标签,包括:The neural network model includes a feature screening layer, a shared hidden layer, a sound intensity hidden layer, and a sound source hidden layer, wherein the feature screening layer is cascaded with the shared hidden layer, the sound intensity hidden layer, and the sound source hidden layer, respectively, and the shared hidden layer is cascaded with the sound intensity hidden layer and the sound source hidden layer, respectively. The neural network model is applied to the soundscape training features to obtain the soundscape training label for the training geographical area output by the neural network model, including: 将所述声景训练特征输入至特征筛选层,得到所述特征筛选层从所述声景训练特征中筛选出的共享特征、声强特征和声源特征,所述共享特征为与声强和声源均相关的声景特征,所述声强特征为只与声强相关的声景特征,所述声源特征为只与声源相关的声景特征;Inputting the soundscape training features into a feature screening layer, obtaining shared features, sound intensity features, and sound source features screened by the feature screening layer from the soundscape training features, wherein the shared features are soundscape features related to both sound intensity and sound source, the sound intensity features are soundscape features only related to sound intensity, and the sound source features are soundscape features only related to sound source; 将所述共享特征输入至所述共享隐藏层之后,将所述声强特征输入至所述声强隐藏层、将所述声源特征输入至所述声源隐藏层,得到所述声强隐藏层输出的声强训练标签、所述声源隐藏层输出的声源训练标签,并将所述声强训练标签和所述声源训练标签作为声景训练标签。After the shared features are input into the shared hidden layer, the sound intensity features are input into the sound intensity hidden layer, and the sound source features are input into the sound source hidden layer, so as to obtain the sound intensity training labels output by the sound intensity hidden layer and the sound source training labels output by the sound source hidden layer, and use the sound intensity training labels and the sound source training labels as soundscape training labels. 7.一种终端设备,其特征在于,所述终端设备包括存储器、处理器及存储在所述存储器中并可在所述处理器上运行的融合多源城市数据的声景预测模型训练程序,所述处理器执行所述融合多源城市数据的声景预测模型训练程序时,实现如权利要求1-4任一项所述的融合多源城市数据的声景预测模型训练方法的步骤。7. A terminal device, characterized in that the terminal device comprises a memory, a processor, and a soundscape prediction model training program for integrating multi-source city data stored in the memory and executable on the processor, wherein when the processor executes the soundscape prediction model training program for integrating multi-source city data, the steps of the soundscape prediction model training method for integrating multi-source city data as described in any one of claims 1 to 4 are implemented. 8.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有融合多源城市数据的声景预测模型训练程序,所述融合多源城市数据的声景预测模型训练程序被处理器执行时,实现如权利要求1-4任一项所述的融合多源城市数据的声景预测模型训练方法的步骤。8. A computer-readable storage medium, characterized in that a soundscape prediction model training program for fusing multi-source city data is stored on the computer-readable storage medium, and when the soundscape prediction model training program for fusing multi-source city data is executed by a processor, the steps of the soundscape prediction model training method for fusing multi-source city data according to any one of claims 1 to 4 are implemented.
CN202311763103.9A 2023-12-19 2023-12-19 Soundscape prediction model training method and prediction method integrating multi-source urban data Active CN117743931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311763103.9A CN117743931B (en) 2023-12-19 2023-12-19 Soundscape prediction model training method and prediction method integrating multi-source urban data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311763103.9A CN117743931B (en) 2023-12-19 2023-12-19 Soundscape prediction model training method and prediction method integrating multi-source urban data

Publications (2)

Publication Number Publication Date
CN117743931A CN117743931A (en) 2024-03-22
CN117743931B true CN117743931B (en) 2024-11-12

Family

ID=90252353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311763103.9A Active CN117743931B (en) 2023-12-19 2023-12-19 Soundscape prediction model training method and prediction method integrating multi-source urban data

Country Status (1)

Country Link
CN (1) CN117743931B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119918605A (en) * 2024-09-12 2025-05-02 深圳大学 A fast wind environment prediction method based on physical consistency neural network
CN119293478A (en) * 2024-09-19 2025-01-10 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) A method and system for urban soundscape perception extraction based on text big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545427A (en) * 2022-09-20 2022-12-30 中山大学 Ecological land use protection method and system based on sound landscape intelligent analysis

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150382123A1 (en) * 2014-01-16 2015-12-31 Itamar Jobani System and method for producing a personalized earphone
CN106714099B (en) * 2015-11-16 2020-11-03 阿里巴巴(中国)有限公司 Photo information processing and scenic spot identification method, client and server
EP3807870B1 (en) * 2018-06-12 2023-10-25 Harman International Industries, Incorporated System for adaptive magnitude vehicle sound synthesis
CN111883350B (en) * 2020-07-21 2021-11-05 国网河南省电力公司电力科学研究院 Transformer substation noise frequency selection suppression device considering temperature factors and frequency selection method
CN112735443B (en) * 2020-12-25 2024-06-07 浙江弄潮儿智慧科技有限公司 Ocean space resource management system with automatic classification function and automatic classification method thereof
CN113782053B (en) * 2021-09-04 2023-09-22 天津大学 Automatic monitoring method for urban soundscape quality worthy of protection
CN114961352A (en) * 2022-05-24 2022-08-30 中建科技集团有限公司 Sound scene and interaction device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545427A (en) * 2022-09-20 2022-12-30 中山大学 Ecological land use protection method and system based on sound landscape intelligent analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于人工神经网络的深圳城市声景评价研究;徐东超;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20200215;第12-27页、第44-59页 *

Also Published As

Publication number Publication date
CN117743931A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN117743931B (en) Soundscape prediction model training method and prediction method integrating multi-source urban data
CN111242493B (en) A street quality evaluation method, device, system and storage medium
Krause et al. Measuring and interpreting the temporal variability in the soundscape at four places in Sequoia National Park
Alamgir et al. Downscaling and projection of spatiotemporal changes in temperature of Bangladesh
CN110610190A (en) A Convolutional Neural Network Rain Intensity Classification Method for Rainy Day Images
Grubeša et al. Mobile crowdsensing accuracy for noise mapping in smart cities
CN114936691B (en) A temperature prediction method integrating correlation weighting and spatiotemporal attention
Makwana et al. Hydrological stream flow modelling using soil and water assessment tool (SWAT) and neural networks (NNs) for the Limkheda watershed, Gujarat, India
CN113222316A (en) Change scene simulation method based on FLUS model and biodiversity model
CN110956412A (en) Flood dynamic assessment method, device, medium and equipment based on real-scene model
Nourani et al. Sensitivity analysis and ensemble artificial intelligence-based model for short-term prediction of NO2 concentration
Theochari et al. Hydrometeorological-hydrometric station network design using multicriteria decision analysis and GIS techniques
Can et al. CENSE Project: general overview
CN116449460B (en) Regional month precipitation prediction method and system based on convolution UNet and transfer learning
Huang et al. Estimating urban noise along road network from street view imagery
CN118014297A (en) An intelligent evaluation method and system for supply and demand responsiveness of outdoor fitness facilities
Maneechot et al. Evaluating the necessity of post-processing techniques on d4PDF data for extreme climate assessment
Rashid et al. Understanding hurricane evacuation behavior from Facebook data
Su et al. An evaluation of two statistical downscaling models for downscaling monthly precipitation in the Heihe River basin of China: H. Su et al.
Sofianopoulos et al. Citizens as Environmental Sensors: Noise Mapping and Assessment on Lemnos Island, Greece, Using VGI and Web Technologies
WO2021017445A1 (en) Convolutional neural network rainfall intensity classification method and quantification method aimed at rainy pictures
Fei et al. Adapting public annotated data sets and low-quality dash cameras for spatiotemporal estimation of traffic-related air pollution: A transfer-learning approach
CN116562125A (en) A method and system for simulating and predicting the sense of street safety by coupling the characteristics of the crowd and the characteristics of the built environment of the street
Sonnenschein et al. Hybrid cellular automata-based air pollution model for traffic scenario microsimulations
Singh Modeling stream flow with prediction uncertainty by using SWAT hydrologic and RBNN models for an agricultural watershed in India

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared