Metode Soundscape
Metode Soundscape
ed
2 Soundscape--The Case of Fuzhou City's Main Urban Area
iew
5 Abstract
6 Soundscape is an important part of urban landscape and plays a key role in the health and well-
7 being of citizens. How to predict the soundscape over a large area with fine resolution remains a
8 great challenge. In this study, based on machine learning algorithms, street view images are utilized
v
9 to estimate the large-area urban soundscape, instead of the traditional time-consuming and laborious
10 large-scale noise detection work. First, a computer vision method is applied to extract landscape
re
11 visual feature indicators from large-area streetscape images. Second, the 15 soundscape indicators
12 collected are used and correlated with the landscape visual indicators to construct a prediction model,
13 which is finally applied to the estimation of large-area urban soundscape. Empirical evidence with
14 98,000 street view images in Fuzhou City shows that street view images can be used to predict the
er
15 street soundscape, a finding that further validates the effectiveness of the street view image
16 prediction method based on machine learning algorithms in soundscape landscape prediction.
17 Keywords: streetscape imagery, soundscape quantization, cityscape, spatial analysis, deep learning,
pe
18 Fuzhou
19 1. Introduction
20 Urban streets are an important part of urban public space, not only as transportation corridors,
21 but also as an important means of strengthening urban social ties, promoting social interaction and
ot
22 improving the quality of life of urban residents(Appleyard and Lintell, 1972; Hassen and Kaufman,
23 2016).The urban street acoustic environment is a key influence on the streetscape experience, which
24 not only affects people's quality of life, but also reflects the city's culture and environment (Goines
tn
25 and Hagler, 2007; Skånberg and Öhrström, 2002; Sun et al., 2019).Studies show bad sound can lead
26 to cardiovascular disease, sleep problems, irritability, and cognitive impairment in children (Daiber
27 et al., 2019; Jenkins et al., 1981; Meecham and Smith, 1977),pleasant sounds can promote public
28 health (Andringa and Lanser, 2013; Meng et al., 2021; Sammler et al., 2007).Conventional acoustic
29 meters usually only measure the physical properties of sound, however, human perception of sound
rin
30 and its impact on health depends not only on the physical properties of sound, but also on the
31 individual's subjective perception and mental state (Nilsson and Berglund, 2006).As emphasized in
32 ISO 12913-1 (Brooks, 2016),The environment plays a key role in soundscape assessment and design,
33 which focuses on human perception of the environment rather than physical measurements.
ep
34 Improving the perceived quality of the soundscape is therefore important for improving health
35 (Hasegawa and Lau, 2022a; Herranz-Pascual et al., 2010).
36 Soundscapes are an important factor related to audio-visual perception and human health(Shi,
37 2021).Its research has three main important directions aimed at studying how people perceive the
Pr
Corresponding author: College of Landscape Architecture and Art, Fujian Agriculture and Forestry
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
38 acoustic environment (Brown, 2011).One is to analyze recordings by physical parameters to
39 objectively obtain soundscape information (Barber et al., 2011).Second, subjective soundscape
ed
40 information is obtained through questionnaires, interviews and field observations (Liu et al., 2013;
41 Liu et al., 2013a).Third, by combining subjective and objective (Jeon et al., 2010).Various studies
42 have been proposed to measure and evaluate the soundscape in order to improve the quality of the
43 urban soundscape, such as the use of sound level meters, noise sensors, etc., which are arranged in
iew
44 specific locations to provide accurate data on the soundscape, but there are a number of limitations
45 to this approach (Gasco et al., 2020; Verma et al., 2019).First, the cost of arranging sensors is very
46 high, requiring the purchase and installation of a large number of sensor devices. Second, sensors
47 can only cover a limited area. To overcome these problems, researchers are developing inexpensive
48 and large-scale soundscape assessment methods that utilize new data sources such as smartphones
v
49 and social media data (Gasco et al., 2017; Gasco et al., 2019).While these methods have the
50 advantages of real-time, large-scale, low-cost, and individualization, smartphone and social media
re
51 data may be subject to sampling bias because not everyone uses a smartphone or social media, or
53 Urban street imagery creates opportunities to advance multi-scale urban research due to its
54 broad coverage and fine spatial sampling. It has been used to quantify urban greenery (Hawes et al.,
55
er
2022; Long and Liu, 2017; Wu et al., 2020), urban climate (Ignatius et al., 2022), tourist behavior
56 (Guo and Loo, 2013; Ning et al., 2022), building characteristics and distribution (Kelly et al., 2013;
57 Keralis et al., 2020; Nguyen et al., 2019), traffic (Wang et al., 2022), road accessibility(Ewing and
pe
58 Cervero, 2010; Hara et al., 2013), safety (Song et al., 2020; Zhanjun et al., 2022), knowledge of
59 crime (Branas et al., 2018; McKee et al., 2017; Perkins et al., 1992), and urban perception(Dubey
60 et al., 2016; Guan et al., 2022; Kruse et al., 2021). Techniques and algorithms of computer vision
61 play an important role in street view image processing and analysis. For example, semantic
62 segmentation is an important deep learning model in the field of computer vision, main mainly used
ot
63 for urban feature extraction. It converts a two-dimensional image into a pixel-level index based on
64 a convolutional network, which enables segmentation and classification of different objects and
65 regions in the image. Commonly used semantic segmentation models include YOLO, SegNet,
tn
66 VGGNet and DeepLab. In addition, there are CV models such as target detection and image
67 classification that can also efficiently extract high-level features from images (Verma et al., 2020a)。
68 Studies have been conducted to automatically identify hazardous scenes related to non-motorized
69 transportation and their immediate causes from street view images (SVI) using target detection and
rin
70 classification, etc. (Wang et al., 2022). Further, urban features extracted from SVI by computer vision
71 models can efficiently estimate hidden community socio-economic conditions such as travel
72 behavior, poverty status, health outcomes and behaviors, and crime (Fan et al., 2023).This provides
73 the basis for this project to predict the urban soundscape through street view imagery.
ep
74 Human visual and auditory perception are inextricably linked, and streetscape perception is
75 influenced not only by visual components but also by acoustic components (Einhäuser et al., 2020;
76 Salem et al., 2018a; Verma et al., 2020b).Previous research has shown a strong correlation between
77 soundscape and visual aesthetics (Carles et al., 1999; Liu et al., 2013b; Meng and Kang, 2015; Meng
Pr
78 et al., 2017; Schroeder and Anderson, 1984).For example Carles et al. used 36 sounds and images
79 to study the interaction between visual and auditory stimuli, and their results suggest that
80 consistency (or coherence) between sounds and images affects landscape preferences (Carles et al.,
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
81 1999).However, these studies have mainly explored the correlation between the two and lacked
82 prediction and further quantification of soundscape metrics. This study focuses on high-resolution
ed
83 quantification and prediction of the street soundscape at the city level.
84 Therefore, this study aims to investigate how streetscape images can be utilized for large-scale
85 street soundscape assessment and prediction. Specifically, this study poses two research
86 questions:(1) How can different soundscape metrics be acquired at high resolution at the city level.
iew
87 (2) What is the relationship between landscape visual elements and soundscape metrics in
88 streetscape images? To answer these questions, we first extracted pixel features, semantic
89 segmentation and object detection results from urban streetscape images using computer vision and
90 deep learning models. Then, 15 soundscape indicators are constructed from four aspects: sound
91 intensity, soundscape quality, sound source, and human perception. Based on these metrics, Section
v
92 4 predicts the soundscape metrics for 985.44 million street images using machine learning
93 algorithms, and obtains the distribution of high-resolution soundscape metrics at the city level.
re
94 Section 5 discusses the advantages of street images for predicting soundscape metrics and the
95 limitations of this study. Finally, conclusions are presented in Section 6. Our work enables
96 soundscape visualization, which helps to understand the distribution of soundscape, reveals the
97 relationship between the urban visual environment and soundscape, and facilitates optimization of
98 urban planning and design, improvement of the urban environment, enhancement of health and well-
99
er
being, enhancement of urban marketing and attractiveness, and facilitation of community
100 participation and decision-making. These benefits contribute to the creation of livable and
101 sustainable cities that enhance the quality of life of residents and the competitiveness of cities.
pe
102 2 Methodology
103 The integrated framework proposed in this study consists of three main parts (Fig. 1). First,
104 visual features of street panorama images are extracted based on computer vision algorithms and
105 deep learning models at three levels: pixel-level features, object-level features and scene-level
ot
106 features. Second, the indicators of street soundscape are constructed from four aspects: sound
107 intensity, sound quality, sound source and perceived emotion. Third, the GBRT model is used to
108 construct a soundscape prediction model to realize the measurement of street soundscape by humans
tn
111 Soundscape is a conceptual framework for acoustics or sound-related issues involving the
rin
112 physical properties of sound, spatial distribution, environmental factors, and perceptual and
113 emotional responses of human hearing (Hasegawa and Lau, 2022b) . As shown in Fig. 2, in order
114 to realize the construction of the soundscape indicator system from the sound environment to human
ep
115 emotional response to evaluate the urban soundscape, a total of 15 perceptual indicators are
116 identified from the existing literature, which mainly include four aspects of sound intensity, sound
117 source, human perception and sound quality (Axelsson et al., 2014; Liu et al., 2019). Based on the
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
121 whole, i.e., "sound quality", and the other is the perceived emotion of different sounds, which is
122 mainly categorized into eight subcategories: pleasant, chaotic, energetic, bland, calm, annoying,
ed
123 meddlesome, and monotonous. As shown in Table 1, we have established a soundscape indicator
124 system with four categories and fifteen subcategories.
125 The Street Soundscape Perception Survey was designed to collect soundscape metrics from the
126 table above. We used the acquired images with recorded audio to score each street scene image
iew
127 individually for each of the 15 soundscape metrics. To minimize experimental error, we divided
128 participants into an offline group of local residents and an online group of non-local residents. There
129 were a total of 20 participants offline and 200 participants online.
v
re
er
pe
ot
tn
Fig. 2. Urban soundscape indication system from acoustic environment to human response
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
130
131
ed
132
133
Table 1
The content of soundscape perception survey
iew
Question Indicator Scale
1. Overall, how do you feel the overall Sound intensity Very noisy, noisy,
sound intensity (noisy or quiet) from generally,
the audio? quiet, very quiet
2. Overall, how do you feel about the Sound quality verybad,bad,
overall sound quality (good or bad) Generally ,
from the audio? good, very bad
v
3. How much do you currently Traffic noise, Human sounds, No sensation at all,don't feel
experience the following sound Natural sound, mechanical dominant,generally,dominant,
re
types in the above scene? noise, musical noise Completely dominant
4. To what extent do you agree or Pleasant,Chaotic,Vibrant,Uneve Completely disagree,disagree,
disagree with the consistency of the ntful,Calm,Annoying,Eventful, generally
following feelings about the sound Monotonous agree, completely agree
environment with the above
scenario?
134
er
2.2 Visual Characteristics of Streetscape Images
135 SVI provides a unique view of ground-level urban landscapes with extensive coverage and fine
pe
136 spatial sampling, and has been widely used in urban built environment studies at multiple scales.
137 (Biljecki and Ito, 2021).These images are usually labeled according to different research purposes,
138 and computer vision techniques are further utilized to construct visual features such as pixel-level,
139 object-level, semantic-level, and scene-level. Specifically, pixel-level features characterize the
140 overall impression of the SVI (e.g., brightness and saturation) and influence the emotional
ot
141 perception. Object-level visual features refer to operations such as detecting, recognizing, and
142 tracking objects in an image, e.g., cars, people, etc. Semantic-level visual features refer to semantic
143 segmentation and semantic understanding of an image, so as to extract the semantic information of
tn
144 different regions in the image. For example, the proportion of vegetation, sky, roads, etc. Scene-
145 level visual features refer to scene understanding and scene categorization of the whole image, so
146 as to extract the overall features of the image. Examples include parks or highways. Since this
147 project is to study streetscapes, scene-level visual features will not be analyzed anymore.
rin
148 As shown in Table 2, the extraction of visual features is mainly divided into three levels: pixel
149 level, object level and semantic level. Pixel-level features are extracted by using the algorithm
150 retrieval from Opencv library to convert the image from RGB color space to HSV (Hue, Saturation,
151 Value) color space, and by calculating the histograms of the different color channels to get the color
ep
152 features of the image. The task of object level feature extraction is to identify and calculate the
153 number of elements of 91 object types (e.g., buses, people, trucks, etc.) by using the yolov5-master
154 algorithm using the target detection technique in deep learning and the COCO dataset. For the
155 semantic segmentation task of SVI FCN-8s model trained on ADE20K dataset is used in which 18
Pr
156 types of labels (e.g., sky, vegetation, roads, buildings, etc.) are labeled from the SVI. This study of
157 street scene images explores the relationship between street scene visual features and human
158 perception, aiming to identify key features of visual features that affect human perception.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
Table 2
Summary of feature extraction models and algorithms.
ed
Feature Model/Lib Dataset Features
Pixel-level Hue,Saturation,Lightness,Hue_std,
OpenCV -
features Saturation_std, Lightness_std
Object-
Yolov5- 91 Object types
level COCO
master (person,bus,truck,motorcycle,etc.)
iew
features
Semantic- 18 categories
FCN-8s citysacpes
level features (building,sky,Road,etc.)
160 Predicting each soundscape metric is considered as a supervised regression task. Random forest
v
161 is an integrated learning method that performs classification or regression tasks by combining
162 multiple decision trees. The main feature of random forest is that each decision tree is trained on a
re
163 randomly selected subset of samples and features.
164 The basic steps of Random Forest are mainly: first, a decision tree is constructed by randomly
165 selecting a portion of samples from the training set (with putative sampling). Second, for each
166 decision tree, randomly select a subset of features for training that decision tree. Third, repeat steps
er
167 1 and 2 to construct multiple decision trees. Fourth, for the classification task, each decision tree
168 votes to give a prediction result; for the regression task, the prediction result of each decision tree
169 is averaged. Fifth, the final prediction result is synthesized from the prediction results of multiple
pe
170 decision trees.
171 The prediction accuracy of the random forest model is mainly affected by the number, depth,
172 and samples of the regression tree; in general, the prediction accuracy of the model improves as the
173 number of books increases; however, if the depth of the tree is too large, it may lead to overfitting,
174 which reduces the accuracy of the model. When there is an imbalance in the number of samples
175 from different categories in the training data, it may result in the model predicting better for a larger
ot
176 number of categories and poorly for a smaller number of categories. We composed the input
177 variables of street visual features and corresponding soundscape metrics, a total of 115 street visual
178 features as inputs to predict 15 different soundscape metrics.
tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
Fig. 3. Random forest flow chart
ed
179 2.4 Entropy weight method
180 Entropy weight method is a purely objective evaluation method that follows the law that the
181 greater the degree of dispersion of an indicator, the lower the information entropy of that indicator
182 and the greater the amount of information it contains. If the values of an indicator are all equal, the
iew
183 indicator does not work in the comprehensive evaluation. In this paper, the entropy weight method
184 is used to determine the weight of the visual features of SVI from the objective level, and the specific
185 steps include the following five steps, the first is to construct the judgment matrix, the second is to
186 carry out the dimensionless processing of the data, the third is to calculate the weight of the indexes,
187 the fourth is to calculate the entropy value and the coefficient of discrepancy, and the last is to
v
188 calculate the entropy weight. Its final result can be calculated by realizing the following formula:
𝑑𝑗
re
𝑤𝑗 = 𝑚
∑𝑗=1 𝑑𝑗
189 where 𝑤𝑗 is the weight of the jth indicator, 𝑑𝑗 is the coefficient of variation of the jth indicator,
190 and m is the number of indicators.
198 area and one of the most economically developed areas in Fujian Province, and traffic noise is the
199 main noise source. Therefore, it is of great significance to explore the visual elements and thus
200 spatial patterns of the regional soundscape to promote the sustainable development of the city.
tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
ed
v iew
re
g. 4. Overview of geographical location of the study area.
208 part.
210 Audio data were collected from 45 randomly selected survey sites in the main urban area of
211 Fuzhou City. The collection mainly consisted of three-minute videos, 4-10 panoramic image shots,
212 and three-minute recordings of changes in sound intensity. The measurement equipment included
213 ① using a sound level meter (UT353BT) to measure the intensity of the sound, ② using a
rin
221 is constructed using SVI of Fuzhou city, where 70% of the SVI is used as the training dataset and
222 30% as the test dataset. Mean Absolute Percentage Error (MAPE) and Coefficient of Determination
223 (R2) were used to evaluate the model. Taking sound intensity as an example, as can be seen in Table
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
224 3, the MAPE of Fuzhou city ranges from 3.443 to 7.759, and the R2 ranges between 0.421 and 0.776.
225 the KNN model performs the worst in the dataset. The MAPE and R2 of the RF model have the best
ed
226 performance in the dataset. As a result , RF model is used as the prediction model.
Table 3
Sound intensity prediction accuracy in different models.
Model MAPE(%) R2
iew
KNN 7.759 0.421
BP 5.608 0.534
SVR 7.729 0.425
RF 3.443 0.776
v
228 MAPE and R2 are commonly used to assess the predicted outcomes of the 15 soundscape
re
229 metrics in random forests. As shown in Fig. 5 and Fig. 6, the MAPE values of the different
230 soundscape metrics vary considerably. Among them, the MAPE of music noise is 19.43, while that
231 of chaos is 31.19. zhao(Zhao et al., 2023) and others have suggested that the reason for this site is
232 becausethe musical noise indicator value is typically tiny in most situations, and the same absolute
er
233 inaccuracy may result in a higher MAPE.Similar to MAPE, R2 varied with soundscape metrics. The
234 higher R2 values were for sound intensity and traffic noise with 0.63 and 0.58, respectively, while
235 music and calm had lower R2 of 0.26 and 0.29.This result suggests that people are more sensitive to
pe
236 the perception of sound intensity, traffic noise, chaotic, and meddlesome, and less sensitive to the
237 perception of a certain sound attribute such as flat and monotonous, which is in line with our
238 expectation as well as previous scholars' (Axelsson et al., 2010)research results are consistent.
ot
tn
rin
240
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
ed
v iew
re
Fig. 6. The R2 of soundscape indicators prediction model
241
252 soundscape in the main urban area of Fuzhou City has a certain degree of reliability. Some of the
253 sound field measurements were significantly different from the predicted values, possibly because
254 the sound intensity measured in three minutes may not be representative.
tn
rin
ep
256 Using EWM to calculate the weights, from Fig. 8, it can be seen that the information entropy
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
257 of sound intensity is significantly higher than that of other indicators, followed by sound quality,
258 while the entropy weights of indicators such as bland, music noise, and calm are lower, which
ed
259 indicates that people are more sensitive to the indicators such as sound intensity and sound quality
260 than these indicators. This paper explores the variability of the spatial distribution of these indicators
261 by analyzing the indicators whose weights cumulatively exceed 60%.
262
iew
4.2.1 Sound Intensity Distribution Map
263 Sound intensity refers to the amount of energy in a sound and is one of the most important
264 indicators for assessing sound. It affects both the quality and clarity of sound, and is also related to
265 hearing protection and environmental noise control. Therefore, this indicator is needed for further
266 analysis. The sound intensity distribution in the main urban area of Fuzhou City is shown in Fig. 9.
v
267 Overall, the sound intensity distribution is low in the center and high in the north and south. Most
268 of the high-intensity areas are concentrated along highways, development zones, etc., while the low-
re
269 intensity areas are mostly concentrated in parks and along the Wulong and Min rivers, which is
270 consistent with our expectations. Specifically, the areas with higher sound intensity include
271 highways and construction sites, such as the construction of infrastructure in development zones (①)
272 and the Third Ring Expressway (②). To our surprise, the sound intensity in the busiest core business
er
273 district, located at Dongjiekou in Fuzhou City, is lower than expected. This may be due to the fact
274 that these areas are also well vegetated, as shown in the corresponding Street View image (③),
275 which may attenuate the perception of sound intensity.This is consistent with the findings of Van
pe
276 Rentorghem (Van Renterghem, 2019), who suggested that vegetation can strongly improve
277 environmental noise perception. Noise levels in residential areas such as Huangshan New Town are
278 at low to medium value levels ( ④ ). Low intensity areas were identified as parks with more
279 vegetation as well as mountain forests, etc. (⑤). In general, the distribution of sound intensity is
280 highly correlated with urban functions, which is consistent with the study of MR Monazzam et
ot
281 al(Monazzam et al., 2015), who revealed that noise levels vary across land uses.
tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
Fig. 9. Distribution of sound intensity in the main urban area of Fuzhou
ed
282 4.2.2 Typical soundscape indicator distribution
283 Further exploration of sound quality, traffic noise, natural sound, chaotic, eventful and vibrant
284 metrics. The results are shown in Fig. 10. The areas with better sound quality are mainly located
285 near parks and scenic areas, such as West Lake Park, Minjiang Park and Gushan Scenic Area. Areas
iew
286 with poorer sound quality are mainly concentrated in suburban areas with more highways and
287 construction sites. Natural sound values are usually higher in park areas with more vegetation in the
288 center. Traffic noise has a similar distribution to chaotic as well as eventful, with higher values
289 concentrated near freeways and downtown attractions. To our surprise, developed areas such as
290 Sanfangqixiang, Dongjiekou, and Wanda have higher vibrancy values despite being busy and
291 having more traffic noise, due to the fact that developed areas in the main urban area of Fuzhou City
v
292 are more green and orderly, providing a more pleasant environment for residents.
re
293 4.3 Relationship between soundscape indicators and visual features
294 A multiple regression model was used to explore the contribution of visual features to the
295 influence of soundscape indicators. To improve the interpretability of the model and to minimize
296 the redundancy of variables, the study summarized the set of 115 visual features into 19 variables
er
297 (Table 4). A stepwise regression backward method was used to select the variables, and the process
298 included (a) selecting a significance level (e.g., 0.05) and retaining variables with p-values less than
299 that significance level in the model, (b) removing the variables with the largest p-values from the
pe
300 model and re-fitting the model, and (c) evaluating the fit of the model after removing the variables
301 by means of a statistical index. If the assessment is unsatisfactory, go back to step 4 and continue to
302 remove the variable with the largest p-value, and (d) repeat steps b and c until an end condition is
303 met. Until the p-values of all variables are less than the significance level.
304
ot
305
tn
rin
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
ed
iew
Nature soud Annoying
v
re
er
pe
Evenful Virbrant
Fig. 10. Spatial distribution of typical soundscape indicators.
306
Table 4
Regression analysis variables.
ot
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
vehicle_semantic Percentage of vehicle pixels in the SVI
building_semantic Percentage of building pixels in the SVI
ed
Percentage of pixels from other categories in the Cityscapes
other_semantic
dataset in the SVI
307
308 The visual features of streetscape and soundscape indicators were analyzed by multiple
iew
309 regression, and the results are shown in Fig. 11. We select the visual features that are sorted into the
310 top 6 contribution rates. The length of the bar indicates the normalization coefficient. Overall, the
311 street scene visual features contribute differently to different neural indicators.
312 For sound intensity, vehicle_semantic, car_object and bus_objeect play a significant positive
313 correlation, however lightness_mean and lightness_std are the strongest negatively correlated visual
314 features. In terms of sound quality, nature_semantic was positively correlated with the sound quality
v
315 score, however vehicle_semantic, building_semantic, and truck_object were negatively correlated
re
316 with soundscape quality, which is consistent with our expected results. There are also two pixel-
317 level features, saturation_std and lightness_mean, that appear in the sound quality list, suggesting
318 that these two visual features can make a significant difference in human perception of sound quality.
319 As far as sound sources are concerned, traffic noise and mechanical noise have similar effects
320 on the visual feature metrics, e.g. sky_semantic and building_semantic have the same positive effect.
321
er
However, in mechanical noise trubk_object does not appear in the list, which is due to the fact that
322 probably the number of trucks in the main urban area of Fuzhou City is low, and accordingly the
323 camera captures fewer images. Human voice and music noise are positively affected with
pe
324 person_object and building_semantic. The visual elements with the strongest positive and negative
325 correlations with nature were nature_semantic and building_semantic. it is worth noting that the
326 assessment of sound sources is mainly based on human a priori knowledge rather than on conducting
327 immersive experiences, which may lead to bias in some perceptions(Paes et al., 2021). For example,
328 even though there are no moving vehicles on the highway, there are also perceptions of significant
ot
332 and lightness_mean, and negatively correlated with vehicle_semantic and bus_object.Vibrant is
333 positively correlated with nature_semantic, sky_ semantic, lightness_mean and saturation_std, and
334 negatively with bus_object and other_object. This result is consistent with Chesnokova and
335 Purves(2018) whose find is that humans have a pleasant perception of natural sounds and a negative
rin
336 perception of noises such as vehicles. Chaotic, eventful and annoying were positively influenced by
337 similar visual features such as person_object, vehicle_semantic and car_object. this is due to the
338 fact that the richer the object targets within the street scene, the more complex the scene, the more
339 humans perceive the street to be approximately crowded, and the lower their perceptual emotion.
ep
340 Bland, calm, and monotonous showed strong associations with most visual features, such as
341 sky_semantic and building_semantic both positively affected, while car_object negatively affected
342 these soundscape metrics.
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
ed
v iew
re
er
pe
ot
tn
rin
Fig. 11. The results of the multivariate regression analysis between the visual features and soundscape indicators.
343
344
ep
345
347 In order to explore the relationship between different soundscape metrics in the streetscape
Pr
348 image, we performed a correlation analysis of the soundscape metrics, as shown in Fig. 12. The
349 soundscape metrics were categorized into four categories: I (sound intensity), Q (sound quality), S
350 (sound source) and P (perception). Overall.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
351 There is a high positive correlation between Sound intensity (I) and Traffic noise (S), Chaotic
352 (P), Mechanical noise (S), Annoying (P), and Eventful (P) metrics, i.e., when the sound intensity is
ed
353 higher, Traffic noise, Chaotic sense, Mechanical noise, and Annoying eventful There is a strong
354 positive correlation between the indicators of Human sounds, musical noise, vibrant and pleasant.
355 In addition, Sound quality, Nature Sound, and Calm showed high positive correlations. On the
356 contrary, there is a strong positive correlation between noise-related metrics (Sound intensity (I)
iew
357 and Traffic noise (S), Chaotic (P), Mechanical noise (S), Annoying (P), Eventful (P)) and high
358 quality sound (Sound quality, Nature Sound, and Calm) with a strong negative correlation.
359 Specifically, the correlation coefficient between Sound intensity (I) and Traffic noise (S) was 0.71.
360 this means that an increase in Sound intensity is accompanied by an increase in Traffic noise. there
361 was also a high positive correlation (0.69) between Sound intensity and Chaotic, suggesting that
v
362 scenarios with higher Sound intensity are usually perceived as chaotic. The correlation coefficient
363 between sound intensity (I) and mechanical noise (S) was 0.68. This means that an increase in sound
re
364 intensity is accompanied by an increase in mechanical noise. On the contrary, the correlation
365 coefficient between sound intensity (I) and quality of noise (S) was -0.42. This means that an
366 increase in the quality of noise may lead to a decrease in natural sounds. The correlation coefficient
367 between sound intensity (I) and calmness (P) is -0.58, which means that an increase in sound
368 intensity may lead to a decrease in the calmness of the environment. - There is a high positive
369
er
correlation (0.51) between Sound quality and Nature Sound, indicating that scenes with better sound
370 quality are usually accompanied by nature sounds.
371 In summary, there are certain correlations between sound intensity and factors such as noise
pe
372 type, noise quality, and the calmness of the environment. The results of these correlation analyses
373 can provide references for environmental noise management, acoustic design and other aspects.
ot
tn
rin
ep
374 5 Discussion
Pr
376 As a data source with wide coverage and easy access, SVI data provides significant advantages
377 for assessing urban street soundscapes. The advantages include (1) Large-scale assessment can be
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
378 realized: SVI can provide a large number of data samples covering a wide range of urban areas.
379 This enables us to conduct large-scale soundscape assessment at the city level and obtain more
ed
380 comprehensive and accurate results. (2) High-resolution information: Streetscape images provide
381 high-resolution visual information that can capture subtle visual elements of the landscape. These
382 elements may be associated with soundscape indicators. By analyzing the landscape features in the
383 streetscape images, we can better understand the mechanism of soundscape formation. (3) In terms
iew
384 of time and cost-effectiveness: By utilizing the existing streetscape image data, we can save time
385 and cost by avoiding the tedious process of conducting field surveys or manually collecting data.
386 This makes soundscape assessment and prediction more efficient and feasible. (4) Close correlation
387 between visual and auditory perception. The relationship between sound and vision has been utilized
388 to predict sound, among other things (Salem et al., 2018b), so it is feasible to use visual data to
v
389 assess soundscape. (5) Visualization and decision support: by combining predicted soundscape
390 metrics with geographic information systems (GIS), we can generate high-resolution maps of the
re
391 distribution of soundscape metrics. These visualization results can provide urban planners,
392 environmental protection agencies and the public with decision support regarding soundscape
393 quality, promoting the improvement of the urban environment and people's quality of life. Therefore,
394 the use of streetscape imagery to predict soundscape has several advantages such as large-scale
395 assessment, high-resolution information, time and cost effectiveness, and visualization and decision
396
er
support, providing a powerful tool and methodology for research and practice.
403 such as vehicle noise, bird calls, and human voices. Streetscape imagery cannot accurately capture
404 all of these sounds, so images alone may not be able to fully predict the street soundscape. SVI to
405 predict the soundscape include the following two aspects in the future development direction, on
tn
406 the one hand, the expansion and diversification of the data set: the current empirical study is
407 conducted with the SVI of Fuzhou City as an example, and in the future, the scope of the study can
408 be expanded to collect street view image data from more cities, covering cities with different
409 geographic and cultural backgrounds. This will make the prediction model more universal and
rin
410 adaptable, and can be applied to a wider range of urban environments. On the other hand, it is the
411 optimization and improvement of the model. The street view image prediction method based on
412 machine learning algorithm has achieved a certain degree of effectiveness in soundscape landscape
413 prediction, but there is still room for improvement. In the future, the algorithm can be further
ep
414 optimized to improve the accuracy and stability of the model. For example, more complex deep
415 learning models such as convolutional neural networks (CNN) and recurrent neural networks (RNN)
416 can be tried to improve the performance of the prediction model.
417 6 Conclusion
Pr
418 The study first uses computer vision methods to extract landscape visual feature indicators
419 from large-area streetscape images, and then correlates the 15 soundscape indicators collected with
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
420 the landscape visual indicators to construct a prediction model, which is applied to 98,000
421 streetscape images in Fuzhou City for empirical demonstration. The results show that the SVI can
ed
422 be used to predict the street soundscape, verifying the effectiveness of the street view image
423 prediction method based on machine learning algorithm in soundscape landscape prediction. This
424 research provides an alternative to traditional noise detection efforts for fine resolution prediction
425 of large area soundscapes. This research brings the following contributions:
iew
426 (1) Streetscape images can be used as a powerful tool for assessing soundscape quality. By
427 analyzing elements such as buildings, greenery, and traffic in streetscape images, we can obtain
428 visual features of the urban environment. These features are related to the propagation and reflection
429 of sound, so they can be used as important indicators to assess the quality of soundscape.
430 (2) There is a certain correlation between the visual features of urban environment and
v
431 soundscape quality. We found that there is a certain correlation between the green area, building
432 height, traffic density and other factors in the streetscape image and indicators such as sound clarity
re
433 and noise level. This suggests that by analyzing the streetscape image, we can initially predict the
434 quality of the soundscape.
435 (3) The method of assessing soundscape quality by SVI can provide reference for urban
436 planning and environmental improvement. By using streetscape images to assess soundscape quality,
437 we can have a more comprehensive understanding of the distribution and influencing factors of
438
er
sound in the urban environment. This will help urban planners to consider soundscape quality when
439 designing urban environments and provide more comfortable and livable urban spaces.
440 In summary, this study demonstrates that the soundscape of a large urban area can be
pe
441 effectively predicted through the use of machine learning algorithms and streetscape imagery. This
442 approach bypasses cumbersome ground-based measurements, the method can be deployed at large
443 scale and fine spatial resolution, and can be analyzed comparatively across multiple cities. This
444 provides strong support for the prediction and planning of urban soundscapes, helping to create a
445 more qualitative urban soundscape environment that plays a key role in the health and well-being
ot
446 of citizens.
448 Quanquan Rui contributed to the central idea, data analysis, and initial draft writing of the paper.
449 Huishan Cheng contributedto refining the ideas and conducting additional analyses.
452 Acknowledgements
ep
453 We would like to express our gratitude to the editors and anonymous reviewers for their invaluable
454 comments on this manuscript.
455 Declarations
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
458 Consent to publish Not applicable.
ed
459 Competing interests The author declares no competing interests.
iew
462
463 References:
464 Andringa, T.C. and Lanser, J.J.L., 2013. How pleasant sounds promote and annoying sounds impede
465 health: a cognitive approach. International journal of environmental research and public health,
466 10(4): 1439-1461.
467 Appleyard, D. and Lintell, M., 1972. The environmental quality of city streets: the residents' viewpoint.
v
468 Journal of the American institute of planners, 38(2): 84-101.
469 Axelsson, Ö., Nilsson, M.E. and Berglund, B., 2010. A principal components model of soundscape
re
470 perception. The Journal of the Acoustical Society of America, 128(5): 2836-2846.
471 Axelsson, Ö., Nilsson, M.E., Hellström, B. and Lundén, P., 2014. A field experiment on the impact of
472 sounds from a jet-and-basin fountain on soundscape quality in an urban park. Landscape and
473 Urban Planning, 123: 49-60. er
474 Barber, J.R., Burdett, C.L., Reed, S.E., Warner, K.A., Formichella, C., Crooks, K.R., Theobald, D.M.
475 and Fristrup, K.M., 2011. Anthropogenic noise exposure in protected natural areas: estimating
476 the scale of ecological consequences. Landscape ecology, 26: 1281-1295.
pe
477 Biljecki, F. and Ito, K., 2021. Street view imagery in urban analytics and gis: a review. Landscape and
478 Urban Planning, 215: 104217.
479 Branas, C.C., South, E., Kondo, M.C., Hohl, B.C., Bourgois, P., Wiebe, D.J. and MacDonald, J.M., 2018.
480 Citywide cluster randomized trial to restore blighted vacant land and its effects on violence, crime,
481 and fear. Proceedings of the National Academy of Sciences, 115(12): 2946-2951.
ot
482 Brooks, B., 2016. The soundscape standard. Institute of Noise Control Engineering, pp. 2188-2192.
483 Brown, A.L., 2011. Advancing the concepts of soundscapes and soundscape planning.
484 Carles, J.L., Barrio, I.L. and De Lucio, J.V., 1999. Sound influence on landscape values. Landscape and
tn
490 Jimenez, M.T., Helmstädter, J. and Steven, S., 2019. Environmental noise induces the release of
491 stress hormones and inflammatory signaling molecules leading to oxidative stress and vascular
492 dysfunction—signatures of the internal exposome. Biofactors, 45(4): 495-506.
493 Dubey, A., Naik, N., Parikh, D., Raskar, R. and Hidalgo, C.A., 2016. Deep learning the city: quantifying
ep
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
501 e2220417120.
502 Gasco, L., Asensio, C. and De Arcas, G., 2017. Towards the assessment of community response to noise
ed
503 through social media. Institute of Noise Control Engineering, pp. 2209-2217.
504 Gasco, L., Clavel, C., Asensio, C. and de Arcas, G., 2019. Beyond sound level monitoring: exploitation
505 of social media to gather citizens subjective response to noise. Science of the Total Environment,
506 658: 69-79.
iew
507 Gasco, L., Schifanella, R., Aiello, L.M., Quercia, D., Asensio, C. and de Arcas, G., 2020. Social media
508 and open data to quantify the effects of noise on health. Frontiers in Sustainable Cities, 2: 41.
509 Goines, L. and Hagler, L., 2007. Noise pollution: a modem plague. South Med J, 100(3): 287-94.
510 Guan, F., Fang, Z., Wang, L., Zhang, X., Zhong, H. and Huang, H., 2022. Modelling people ’ s
511 perceived scene complexity of real-world environments using street-view panoramas and open
v
512 geodata. ISPRS Journal of Photogrammetry and Remote Sensing, 186: 315-331.
513 Guo, Z. and Loo, B.P., 2013. Pedestrian environment and route choice: evidence from new york city and
re
514 hong kong. Journal of transport geography, 28: 124-136.
515 Hara, K., Le, V. and Froehlich, J., 2013. Combining crowdsourcing and google street view to identify
516 street-level accessibility problems, pp. 631-640.
517 Hasegawa, Y. and Lau, S., 2022a. Comprehensive audio-visual environmental effects on residential
518 soundscapes and satisfaction: partial least square structural equation modeling approach.
519
er
Landscape and urban planning, 220: 104351.
520 Hasegawa, Y. and Lau, S., 2022b. Comprehensive audio-visual environmental effects on residential
521 soundscapes and satisfaction: partial least square structural equation modeling approach.
pe
522 Landscape and urban planning, 220: 104351.
523 Hassen, N. and Kaufman, P., 2016. Examining the role of urban street design in enhancing community
524 engagement: a literature review. Health & place, 41: 119-132.
525 Hawes, J.K., Gounaridis, D. and Newell, J.P., 2022. Does urban agriculture lead to gentrification?
526 Landscape and Urban Planning, 225: 104447.
ot
527 Herranz-Pascual, K., Aspuru, I. and García, I., 2010. Proposed conceptual model of environmental
528 experience as framework to study the soundscape, pp. 2904-2912.
529 Ignatius, M., Xu, R., Hou, Y., Liang, X., Zhao, T., Chen, S., Wong, N.H. and Biljecki, F., 2022. Local
tn
530 climate zones: lessons from singapore and potential improvement with street view imagery.
531 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10:
532 121-128.
533 Jenkins, L., Tarnopolsky, A. and Hand, D., 1981. Psychiatric admissions and aircraft noise from london
rin
538 Kelly, C.M., Wilson, J.S., Baker, E.A., Miller, D.K. and Schootman, M., 2013. Using google street view
539 to audit the built environment: inter-rater reliability results. Annals of Behavioral Medicine,
540 45(suppl_1): S108-S112.
541 Keralis, J.M., Javanmardi, M., Khanna, S., Dwivedi, P., Huang, D., Tasdizen, T. and Nguyen, Q.C.,
Pr
542 2020. Health and the built environment in united states cities: measuring associations using
543 google street view-derived indicators of the built environment. BMC public health, 20: 1-10.
544 Kruse, J., Kang, Y., Liu, Y., Zhang, F. and Gao, S., 2021. Places for play: understanding human
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
545 perception of playability in cities using street view images and deep learning. Computers,
546 Environment and Urban Systems, 90: 101693.
ed
547 Liu, J., Kang, J., Luo, T. and Behm, H., 2013. Landscape effects on soundscape experience in city parks.
548 Science of the Total Environment, 454: 474-481.
549 Liu, J., Kang, J., Luo, T., Behm, H. and Coppack, T., 2013a. Spatiotemporal variability of soundscapes
550 in a multiple functional urban area. Landscape and urban planning, 115: 1-9.
iew
551 Liu, J., Kang, J., Luo, T., Behm, H. and Coppack, T., 2013b. Spatiotemporal variability of soundscapes
552 in a multiple functional urban area. Landscape and urban planning, 115: 1-9.
553 Liu, J., Wang, Y., Zimmer, C., Kang, J. and Yu, T., 2019. Factors associated with soundscape experiences
554 in urban green spaces: a case study in rostock, germany. Urban Forestry & Urban Greening, 37:
555 135-146.
v
556 Long, Y. and Liu, L., 2017. How green are the streets? An analysis for central areas of chinese cities
557 using tencent street view. PloS one, 12(2): e0171110.
re
558 McKee, P., Erickson, D.J., Toomey, T., Nelson, T., Less, E.L., Joshi, S. and Jones-Webb, R., 2017. The
559 impact of single-container malt liquor sales restrictions on urban crime. Journal of Urban Health,
560 94: 289-300.
561 Meecham, W.C. and Smith, H.G., 1977. Effects of jet aircraft noise on mental hospital admissions.
562 British Journal of Audiology, 11(3): 81-85.
563
er
Meng, Q. and Kang, J., 2015. The influence of crowd density on the sound environment of commercial
564 pedestrian streets. Science of the total environment, 511: 249-258.
565 Meng, Q., An, Y. and Yang, D., 2021. Effects of acoustic environment on design work performance
pe
566 based on multitask visual cognitive performance in office space. Building and Environment, 205:
567 108296.
568 Meng, Q., Sun, Y. and Kang, J., 2017. Effect of temporary open-air markets on the sound environment
569 and acoustic perception based on the crowd density characteristics. Science of the Total
570 Environment, 601: 1488-1495.
ot
571 Monazzam, M.R., Karimi, E., Nassiri, P. and Taghavi, L., 2015. School-reopening impact on traffic-
572 induced noise level at different land uses: a case study. International journal of environmental
573 science and technology, 12: 3089-3094.
tn
574 Nguyen, Q.C., Khanna, S., Dwivedi, P., Huang, D., Huang, Y., Tasdizen, T., Brunisholz, K.D., Li, F.,
575 Gorman, W. and Nguyen, T.T., 2019. Using google street view to examine associations between
576 built environment characteristics and us health outcomes. Preventive medicine reports, 14:
577 100859.
rin
578 Nilsson, M.E. and Berglund, B., 2006. Soundscape quality in suburban green areas and city parks. Acta
579 Acustica united with Acustica, 92(6): 903-911.
580 Ning, H., Li, Z., Wang, C., Hodgson, M.E., Huang, X. and Li, X., 2022. Converting street view images
581 to land cover maps for metric mapping: a case study on sidewalk network extraction for the
ep
582 wheelchair users. Computers, Environment and Urban Systems, 95: 101808.
583 Paes, D., Irizarry, J. and Pujoni, D., 2021. An evidence of cognitive benefits from immersive design
584 review: comparing three-dimensional perception and presence between immersive and non-
585 immersive virtual environments. Automation in Construction, 130: 103849.
Pr
586 Perkins, D.D., Meeks, J.W. and Taylor, R.B., 1992. The physical environment of street blocks and
587 resident perceptions of crime and disorder: implications for theory and measurement. Journal of
588 environmental psychology, 12(1): 21-34.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
589 Ryu, H., Ki, K.S., Yoo, J., Chang, S.I. and Kim, B., 2018. Sound grade classification with sound mapping
590 of national park trails in south korea. The Journal of the Acoustical Society of America, 144(3):
ed
591 1931-1931.
592 Salem, T., Zhai, M., Workman, S. and Jacobs, N., 2018a. A multimodal approach to mapping
593 soundscapes, pp. 2524-2527.
594 Salem, T., Zhai, M., Workman, S. and Jacobs, N., 2018b. A multimodal approach to mapping
iew
595 soundscapes, pp. 2524-2527.
596 Sammler, D., Grigutsch, M., Fritz, T. and Koelsch, S., 2007. Music and emotion: electrophysiological
597 correlates of the processing of pleasant and unpleasant music. Psychophysiology, 44(2): 293-304.
598 Schafer, R.M., 1993. The soundscape: our sonic environment and the tuning of the world. Destiny Books,
599 United States;Rochester, Vt.
v
600 Schroeder, H.W. and Anderson, L.M., 1984. Perception of personal safety in urban recreation sites.
601 Journal of leisure research, 16(2): 178-194.
re
602 Shi, W., 2021. Introduction to urban sensing. Urban Informatics: 311-314.
603 Skånberg, A. and Öhrström, E., 2002. Adverse health effects in relation to urban residential soundscapes.
604 Journal of Sound and Vibration, 250(1): 151-155.
605 Song, G., Liu, L., He, S., Cai, L. and Xu, C., 2020. Safety perceptions among african migrants in
606 guangzhou and foshan, china. Cities, 99: 102624.
607
er
Sun, K., De Coensel, B., Filipan, K., Aletta, F., Van Renterghem, T., De Pessemier, T., Joseph, W. and
608 Botteldooren, D., 2019. Classification of soundscapes of urban public open spaces. Landscape
609 and urban planning, 189: 139-155.
pe
610 Van Renterghem, T., 2019. Towards explaining the positive effect of vegetation on the perception of
611 environmental noise. Urban Forestry & Urban Greening, 40: 133-144.
612 Verma, D., Jana, A. and Ramamritham, K., 2019. Artificial intelligence and human senses for the
613 evaluation of urban surroundings. Springer, pp. 852-857.
614 Verma, D., Jana, A. and Ramamritham, K., 2020a. Predicting human perception of the urban
ot
615 environment in a spatiotemporal urban setting using locally acquired street view images and
616 audio clips. Building and Environment, 186: 107340.
617 Verma, D., Jana, A. and Ramamritham, K., 2020b. Predicting human perception of the urban
tn
618 environment in a spatiotemporal urban setting using locally acquired street view images and
619 audio clips. Building and Environment, 186: 107340.
620 Wang, M., Chen, Z., Rong, H.H., Mu, L., Zhu, P. and Shi, Z., 2022. Ridesharing accessibility from the
621 human eye: spatial modeling of built environment with street-level images. Computers,
rin
626 Wu, D., Gong, J., Liang, J., Sun, J. and Zhang, G., 2020. Analyzing the influence of urban street greening
627 and street buildings on summertime air pollution based on street view image data. ISPRS
628 International Journal of Geo-Information, 9(9): 500.
629 Zhanjun, H.E., Wang, Z., Xie, Z., Wu, L. and Chen, Z., 2022. Multiscale analysis of the influence of
Pr
630 street built environment on crime occurrence using street-view images. Computers, Environment
631 and Urban Systems, 97: 101865.
632 Zhao, T., Liang, X., Tu, W., Huang, Z. and Biljecki, F., 2023. Sensing urban soundscapes from street
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396
633 view imagery. Computers, Environment and Urban Systems, 99: 101915.
634
ed
v iew
re
er
pe
ot
tn
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4514396