[go: up one dir, main page]

CN106991525B - Visual analysis method and system for air quality and residents' travel - Google Patents

Visual analysis method and system for air quality and residents' travel Download PDF

Info

Publication number
CN106991525B
CN106991525B CN201710173669.4A CN201710173669A CN106991525B CN 106991525 B CN106991525 B CN 106991525B CN 201710173669 A CN201710173669 A CN 201710173669A CN 106991525 B CN106991525 B CN 106991525B
Authority
CN
China
Prior art keywords
poi
data
air quality
activity
weighted activity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710173669.4A
Other languages
Chinese (zh)
Other versions
CN106991525A (en
Inventor
谢波
姜波
潘伟丰
王家乐
殷骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201710173669.4A priority Critical patent/CN106991525B/en
Publication of CN106991525A publication Critical patent/CN106991525A/en
Application granted granted Critical
Publication of CN106991525B publication Critical patent/CN106991525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Instructional Devices (AREA)

Abstract

本发明公开了基于大数据驱动的空气质量与居民出行可视分析方法与系统,包括以下几个步骤:(1)原始空气质量数据、温度数据、POI数据和打车难易度数据重构;(2)POI带权活跃度及偏移率计算:POI带权活跃度反映POI周围人流量的大小;偏移率反映POI带权活跃度的变化情况;(3)相同类型POI聚类;(4)空气质量与居民出行的可视分析。本发明成本低廉,维护简单,部署迅速,可视化界面交互具有多样性,每个用户可以多粒度分析空气质量和居民出行状况。

Figure 201710173669

The invention discloses a method and system for visual analysis of air quality and resident travel based on big data drive, including the following steps: (1) reconstruction of original air quality data, temperature data, POI data and taxi difficulty data; ( 2) Calculation of POI weighted activity and offset rate: POI weighted activity reflects the size of the flow of people around POI; offset rate reflects the change of POI weighted activity; (3) The same type of POI clustering; (4) ) Visual analysis of air quality and resident travel. The invention has the advantages of low cost, simple maintenance, rapid deployment, and diversity of visual interface interaction, and each user can analyze air quality and residents' travel conditions in multiple granularities.

Figure 201710173669

Description

Air quality and resident trip visual analysis method and system
Technical Field
The invention relates to a large data drive-based air quality and resident trip visual analysis method and system.
Background
Along with the development of the industrialized process in China, the pollution problem of the industrial excrement mainly comprising sulfide (SOx), nitride (NOx), ozone (O3), carbide (COx) and particulate matters (the particle size is less than or equal to 10 microns and 2.5 microns) to the air quality is increasingly serious, and the pollution problem greatly influences the daily travel and the life of people.
With the development of science and technology, data is collected and stored in large quantities, the data volume is increased explosively, and how to extract valuable information from the data becomes an urgent problem to be solved. In the face of large and complex data, traditional data mining and data analysis methods are not compelling to explore the data. In order to obtain the value contained in the data, various data analysis and mining methods are applied.
Therefore, an effective method for solving these problems is needed. In recent years, as an analysis reasoning science based on a visual interactive interface, visual analysis provides a brand new means for data mining and data analysis, and the visual analysis is popular with researchers due to the characteristics of interactivity, visibility and the like and is gradually a research hotspot.
Therefore, the visual research aiming at the air quality and the travel of the residents has important significance for researching the relationship between the air quality and the travel of the residents, not only can provide important reference for exploring the travel behaviors of the residents, but also can cause the attention of relevant departments such as transportation, medical treatment and the like to the air quality. Therefore, the visual research for exploring the air quality and the resident trip has very important research value in both theory and practical application.
Disclosure of Invention
The invention designs an air quality and resident trip visual analysis method and system based on big data drive aiming at the problems of air quality and resident trip analysis, better helps departments of transportation, medical treatment and the like to analyze the air quality and the resident trip, provides a set of visual analysis system to help a user to analyze air quality characteristics and resident trip characteristics, displays an air quality bar graph, a temperature box graph, a POI (point of interest) activity stacking graph and flow graph, a POI (point of interest) activity migration rate calendar thermal graph and a multidimensional histogram and explores urban air quality and resident trip. The purpose of the invention is realized by the following technical scheme: a big data drive-based air quality and resident travel visual analysis method comprises the following steps:
(1) original air quality data, temperature data, POI data and taxi taking difficulty data are reconstructed: the method comprises the steps of firstly, respectively carrying out data cleaning and sorting on air quality data, temperature data, POI data and taxi taking difficulty and degree data, wherein the data cleaning mainly comprises the steps of searching and removing data abnormity and missing values in various data sources, and then sorting all data according to time according to a timestamp, so that the visualization of subsequent time sequence data is facilitated. The taxi taking difficulty data comprise geographic coordinates and weights of taxi taking difficulty distribution points. The POI data comprises the geographic coordinates of the POI distribution points and the POI types.
(2) Calculating the POI zone weight activity and the deviation rate: the POI weighted activity reflects the flow of people around the POI; the offset rate reflects the change of the POI zone weight activity.
The calculation of the POI zone right activity is specifically as follows:
and (2.1) calculating the Euclidean distance between the taxi taking difficulty distribution points and each POI distribution point, judging whether the Euclidean distance is smaller than a preset threshold value T, and if the condition is met, setting the weight of the taxi taking difficulty distribution points as the weight of the POI activity.
And (2.2) respectively counting the accumulated sum of the activity degrees of the POI of various types according to different types of the POI, and taking the accumulated sum as the weighted activity degree of the POI of the type.
The calculation of the POI zone weight activity offset rate specifically includes:
Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1
wherein, Averweek,hourPOI weighted average of activity for each hour of each week, POIWeighttTaking the weighted activity, Offset, for the current hour POItIs the offset rate.
3) Same type POI clustering: calculating all POI distribution points within the range that the Euclidean distance around each driving difficulty distribution point is less than or equal to T, and recording as POIdidi. Statistical POIdidiAnd (4) calculating the position of the clustering center of the POI distribution points of the same type, and setting the weight of the distribution points with difficulty and easiness in taxi taking as the weight of the clustering center. And clustering the POI distribution points by using a k-means-based clustering algorithm, and taking the calculated new longitude and latitude coordinates of the clustering center as the longitude and latitude coordinates of the center position of the POI.
4) Visual analysis of air quality and resident's trip specifically is:
(4.1) color visual coding: when mapping the color, due to the difference of the Air Quality Index (AQI), a dynamic mapping scheme is adopted, namely, the color is dynamically adjusted according to the air quality index value:
Figure BDA0001251762300000031
wherein the ColorrectIs a rectangular fill color.
(4.2) strip-box plot analysis component: the air quality index for each day is shown as a rectangle with the order of the rectangles from left to right indicating the day's day, the fill color of the rectangle being determined according to the protocol of step 4.1 and the height being determined according to the air quality index AQI. The boxplot represents the temperature every hour of the week, the boxplot shows the date and time of the week from left to right, the upper dotted line and the lower dotted line of the boxplot respectively represent the upper quarter data range and the lower quarter data range, the small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the place, and the horizontal line position in the center of the small rectangle represents the median of the data.
(4.3) flowsheet-stacking diagram analysis component: the abscissa of the stacked graph and the flowsheet refers to the hourly coordinate of the timing range and takes the weekly scale as the basic scale. The ordinate is the POI weighted activity value. The stacked graph represents different types of POI by using area graphs with different colors, is arranged along a coordinate axis on one side and shows the change condition of the one or more POI with the right activity within a specified time range. And the flow graphs are arranged along the two sides of the coordinate, and the change condition of the one or more POI (point of interest) with the right activity within the appointed time range is displayed.
(4.4) scatter matrix-GeoMap-calendar heatmap analysis component: the scatter matrix diagram is an expansion of the high-dimensional aspect of the scatter diagram and is used for displaying air quality, temperature and POI (point of interest) zone authority activity. The calendar heat map presents the multidimensional data in a two-dimensional form, and the size of the numerical value is represented by the shade of color, and the change of the POI tape weight activity offset rate under different air quality and temperature conditions of the same POI is displayed through the calendar heat map. The GeoMap is used for displaying the activity weight and the geographic distribution condition of the POI clusters of the same type.
A big data drive-based air quality and resident trip visual analysis system comprises the following components:
(1) bar-box plot analysis assembly: the air quality index of each day is shown by a rectangle, and the sequence of the rectangles from left to right represents the sequence of the days; the height of the rectangle is determined according to the air quality index AQI, and the filling color adopts a dynamic mapping scheme, namely, the height is dynamically adjusted according to the air quality index value:
Figure BDA0001251762300000041
wherein the ColorrectIs a rectangular fill color.
The boxplot represents the temperature every hour of the week, the boxplot shows the date and time of the week from left to right, the upper dotted line and the lower dotted line of the boxplot respectively represent the upper quarter data range and the lower quarter data range, the small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the place, and the horizontal line position in the center of the small rectangle represents the median of the data.
(2) Flowsheet-stacking diagram analysis component: the abscissa of the stacked graph and the flowsheet refers to the hourly coordinate of the timing range and takes the weekly scale as the basic scale. The ordinate is the POI weighted activity value. The stacked graph represents different types of POI by using area graphs with different colors, is arranged along a coordinate axis on one side and shows the change condition of the one or more POI with the right activity within a specified time range. The flow graph is arranged along the two sides of the coordinate, the change situation of one or more POI (point of interest) belt weight activeness in a specified time range is displayed, and the calculation of the POI belt weight activeness specifically comprises the following steps:
and (2.1) calculating the Euclidean distance between the taxi taking difficulty distribution points and each POI distribution point, judging whether the Euclidean distance is smaller than a preset threshold value T, and if the condition is met, setting the weight of the taxi taking difficulty distribution points as the weight of the POI activity.
And (2.2) respectively counting the accumulated sum of the activity degrees of the POI of various types according to different types of the POI, and taking the accumulated sum as the weighted activity degree of the POI of the type.
(3) Scatter matrix-GeoMap-calendar heat map analysis component: the scatter matrix diagram is an expansion of the high-dimensional aspect of the scatter diagram and is used for displaying air quality, temperature and POI (point of interest) zone authority activity. The calendar heat map presents the multidimensional data in a two-dimensional form, and the size of the numerical value is represented by the shade of color, and the change of the POI tape weight activity offset rate under different air quality and temperature conditions of the same POI is displayed through the calendar heat map. The GeoMap is used for displaying the activity weight and the geographic distribution condition of the POI clusters of the same type.
The calculation of the liveness weight of the POI clusters of the same type is specifically as follows: calculating all POI distribution points within the range that the Euclidean distance around each driving difficulty distribution point is less than or equal to T, and recording as POIdidi. Statistical POIdidiAnd (4) calculating the position of the clustering center of the POI distribution points of the same type, and setting the weight of the distribution points with difficulty and easiness in taxi taking as the weight of the clustering center. And clustering the POI distribution points by using a k-means-based clustering algorithm, and taking the calculated new longitude and latitude coordinates of the clustering center as the longitude and latitude coordinates of the center position of the POI.
The invention has the beneficial effects that: the method is different from the traditional air quality visualization, and aims at the visualization of the air quality and the data of the residents during traveling, so that a user can explore the change situation of the activity of the air quality to different areas of a city from the global to the local and then to the global, and the change of traveling destinations of the residents influenced by the air quality is analyzed. Through the interactive means, the cost of using the system by an analyst is reduced, a good display effect is achieved, and the system can display various rules of air quality and resident trip from four levels of air quality, temperature, POI zone authority activity and offset rate.
Drawings
FIG. 1 bar-box plot analysis component;
FIG. 2 flow sheet-stacking diagram analysis component;
FIG. 3 is a scatter matrix-GeoMap-calendar heatmap analysis component;
FIG. 4 is a front-end dependency diagram of the system.
Detailed Description
The following detailed description is made with reference to the embodiments and the accompanying drawings.
The data base on which the present invention is based is: the air quality data is issued by environment protection administrative departments or environment monitoring stations authorized by the administrative departments at various levels and above, and comprises daily reports and time reports. The time period of the time report data is 1 hour, the real-time report of each monitoring station is issued at each integral point moment, and the indexes of the real-time report comprise SO2、NO2、O3、CO、PM2.5、PM10Concentration, daily data is one day SO2、NO2、O3、CO、 PM2.5、PM1024 hour mean concentration; the atmospheric environment data is issued by the meteorological protection administrative departments at different levels and above or the meteorological monitoring stations authorized by the meteorological protection administrative departments, and comprises daily reports and time reports. The time period of the time report data is 1 hour, the real-time report of each detection station is issued every whole time, and indexes of the real-time report comprise air pressure, temperature, humidity, precipitation, wind direction and other data. The daily data is the average value of 24-hour data of daily air pressure, temperature, humidity, precipitation and wind direction; the resident trip data is driving difficulty data provided by a drop-and-dome-shaped large data platform, wherein the data time period is 1 hour, and driving difficulty of different places is provided at each integral point. Each piece of integer data includes: longitude, latitude, difficulty of taxi taking; the POI distribution data is detailed data of the POI and comprises a POI address, a POI name, a POI longitude, a POI latitude and a POI type.
The invention provides a big data drive-based air quality and resident trip visual analysis method, which comprises the following steps:
(1) original air quality data, temperature data, POI data and taxi taking difficulty data are reconstructed: the method comprises the steps of firstly, respectively carrying out data cleaning and sorting on air quality data, temperature data, POI data and taxi taking difficulty and degree data, wherein the data cleaning mainly comprises the steps of searching and removing data abnormity and missing values in various data sources, and then sorting all data according to time according to a timestamp, so that the visualization of subsequent time sequence data is facilitated. The taxi taking difficulty data comprise geographic coordinates and weights of taxi taking difficulty distribution points. The POI data comprises the geographic coordinates of the POI distribution points and the POI types.
(2) Calculating the POI zone weight activity and the deviation rate: the POI weighted activity reflects the flow of people around the POI; the offset rate reflects the change of the POI zone weight activity.
The calculation of the POI zone right activity is specifically as follows:
(2.1) calculating the Euclidean distance between the difficulty and difficulty degree distribution points of taxi taking and each POI distribution point, judging whether the Euclidean distance is smaller than a preset threshold value T, wherein the T can be 0.5km, and if the condition is met, setting the weight of the difficulty and difficulty degree distribution points of taxi taking as the weight of the POI activity.
And (2.2) respectively counting the accumulated sum of the activity degrees of the POI of various types according to different types of the POI, and taking the accumulated sum as the weighted activity degree of the POI of the type.
The calculation of the POI zone weight activity offset rate specifically includes:
Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1
wherein, Averweek,hourPOI weighted average of activity for each hour of each week, POIWeighttTaking the weighted activity, Offset, for the current hour POItIs the offset rate.
3) Same type POI clustering: calculating all POI distribution points within the range that the Euclidean distance around each driving difficulty distribution point is less than or equal to T, and recording as POIdidi. Statistical POIdidiAnd (4) calculating the position of the clustering center of the POI distribution points of the same type, and setting the weight of the distribution points with difficulty and easiness in taxi taking as the weight of the clustering center. And clustering the POI distribution points by using a k-means-based clustering algorithm, and taking the calculated new longitude and latitude coordinates of the clustering center as the longitude and latitude coordinates of the center position of the POI.
4) Visual analysis of air quality and resident's trip specifically is:
(4.1) color visual coding: when mapping the color, due to the difference of the Air Quality Index (AQI), a dynamic mapping scheme is adopted, namely, the color is dynamically adjusted according to the air quality index value:
Figure BDA0001251762300000071
wherein the ColorrectIs a rectangular fill color.
(4.2) strip-box plot analysis component: the air quality index for each day is shown as a rectangle with the order of the rectangles from left to right indicating the day's day, the fill color of the rectangle being determined according to the protocol of step 4.1 and the height being determined according to the air quality index AQI. The boxplot represents the temperature every hour of the week, the boxplot shows the date and time of the week from left to right, the dotted lines on the boxplot represent the upper quarter data range and the lower quarter data range respectively, the small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the quartile, and the horizontal line position in the center of the small rectangle represents the median of the data, as shown in fig. 1.
(4.3) flowsheet-stacking diagram analysis component: the abscissa of the stacked graph and the flowsheet refers to the hourly coordinate of the timing range and takes the weekly scale as the basic scale. The ordinate is the POI weighted activity value. The stacked graph represents different types of POI by using area graphs with different colors, is arranged along a coordinate axis on one side and shows the change condition of the one or more POI with the right activity within a specified time range. The flow graph is arranged along the coordinate on both sides, and shows the change situation of the one or more POI (point of interest) with the right activity within the specified time range, as shown in FIG. 2.
(4.4) scatter matrix-GeoMap-calendar heatmap analysis component: the scatter matrix diagram is an expansion of the high-dimensional aspect of the scatter diagram and is used for displaying air quality, temperature and POI (point of interest) zone authority activity. The calendar heat map presents the multidimensional data in a two-dimensional form, and the size of the numerical value is represented by the shade of color, and the change of the POI tape weight activity offset rate under different air quality and temperature conditions of the same POI is displayed through the calendar heat map. The GeoMap is used for showing the activity weight and the geographic distribution of the POI clusters of the same type, as shown in fig. 3.
A big data drive-based air quality and resident trip visual analysis system comprises the following components:
(1) bar-box plot analysis assembly: the air quality index of each day is shown by a rectangle, and the sequence of the rectangles from left to right represents the sequence of the days; the height of the rectangle is determined according to the air quality index AQI, and the filling color adopts a dynamic mapping scheme, namely, the height is dynamically adjusted according to the air quality index value:
Figure BDA0001251762300000081
wherein the ColorrectIs a rectangular fill color.
The boxplot represents the temperature every hour of the week, the boxplot shows the date and time of the week from left to right, the dotted lines on the boxplot represent the upper quarter data range and the lower quarter data range respectively, the small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the quartile, and the horizontal line position in the center of the small rectangle represents the median of the data, as shown in fig. 1.
(2) Flowsheet-stacking diagram analysis component: the abscissa of the stacked graph and the flowsheet refers to the hourly coordinate of the timing range and takes the weekly scale as the basic scale. The ordinate is the POI weighted activity value. The stacked graph represents different types of POI by using area graphs with different colors, is arranged along a coordinate axis on one side and shows the change condition of the one or more POI with the right activity within a specified time range. The flow graph is arranged along the coordinate on both sides, and shows the change situation of the one or more POI (point of interest) with the right activity within the specified time range, as shown in FIG. 2. The calculation of the POI zone right activity is specifically as follows:
and (2.1) calculating the Euclidean distance between the taxi taking difficulty distribution points and each POI distribution point, judging whether the Euclidean distance is smaller than a preset threshold value T, and if the condition is met, setting the weight of the taxi taking difficulty distribution points as the weight of the POI activity.
And (2.2) respectively counting the accumulated sum of the activity degrees of the POI of various types according to different types of the POI, and taking the accumulated sum as the weighted activity degree of the POI of the type.
(3) Scatter matrix-GeoMap-calendar heat map analysis component: the scatter matrix diagram is an expansion of the high-dimensional aspect of the scatter diagram and is used for displaying air quality, temperature and POI (point of interest) zone authority activity. The calendar heat map presents the multidimensional data in a two-dimensional form, and the size of the numerical value is represented by the shade of color, and the change of the POI tape weight activity offset rate under different air quality and temperature conditions of the same POI is displayed through the calendar heat map. The GeoMap is used for showing the activity weight and the geographic distribution of the POI clusters of the same type, as shown in fig. 3.
Liveness weight for same type POI clusteringThe value calculation is specifically: calculating all POI distribution points within the range that the Euclidean distance around each driving difficulty distribution point is less than or equal to T, and recording as POIdidi. Statistical POIdidiAnd (4) calculating the position of the clustering center of the POI distribution points of the same type, and setting the weight of the distribution points with difficulty and easiness in taxi taking as the weight of the clustering center. And clustering the POI distribution points by using a k-means-based clustering algorithm, and taking the calculated new longitude and latitude coordinates of the clustering center as the longitude and latitude coordinates of the center position of the POI.
In the preprocessing process of the method, the calculation of the POI weighted activity degree is mainly carried out by counting the accumulated sum of the number of POIs of different types around each taxi taking difficulty degree point so as to obtain the measurement of the POI weighted activity degree; the POI weighted activity deviation rate is mainly used for counting the deviation condition of the real-time POI activity relative to the historical POI weighted activity mean value. By drawing a column-box diagram, a stack-flow diagram and a scatter matrix-GeoMap-calendar heat map, a user can provide important reference for exploring travel behaviors of residents through interaction among various visual views, can also bring importance to air quality of related departments such as transportation and medical treatment, and provides constructive opinions for the related departments.
While the invention has been described with respect to a single embodiment, showing the various aspects of the useful visualization components, it will be apparent that the invention is not limited to the embodiment described, but is capable of numerous modifications without departing from the basic spirit and scope of the invention.

Claims (2)

1.一种基于大数据驱动的空气质量与居民出行可视分析方法,其特征在于,该方法包括以下步骤:1. a visual analysis method of air quality and resident travel based on big data drive, is characterized in that, this method comprises the following steps: (1)原始空气质量数据、温度数据、POI数据和打车难易度数据重构:首先分别对空气质量数据、温度数据、POI数据和打车难易度数据进行数据清理和排序,其中数据清理主要是对各种数据源中数据异常和缺失值的查找及剔除,然后按照时间戳将所有数据按照时间排序;所述打车难易度数据包括打车难易度分布点的地理坐标和权值;所述POI数据包括POI分布点的地理坐标和POI类型;(1) Reconstruction of original air quality data, temperature data, POI data and taxi difficulty data: First, data cleaning and sorting are performed on the air quality data, temperature data, POI data and taxi difficulty data respectively. It is to find and eliminate data anomalies and missing values in various data sources, and then sort all data according to time stamps; the taxi difficulty data includes the geographical coordinates and weights of the distribution points of taxi difficulty; The POI data includes the geographic coordinates and POI types of POI distribution points; (2)POI带权活跃度及偏移率计算:POI带权活跃度反映POI周围人流量的大小;偏移率反映POI带权活跃度的变化情况;(2) Calculation of POI weighted activity and offset rate: POI weighted activity reflects the size of the flow of people around POI; offset rate reflects the change of POI weighted activity; POI带权活跃度的计算具体为:The calculation of POI weighted activity is as follows: (2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值;(2.1) Calculate the Euclidean distance between the taxi difficulty distribution points and each POI distribution point, and determine whether the Euclidean distance is less than the preset threshold T. If the conditions are met, the weight of the taxi difficulty distribution points is set as The weight of this POI activity; (2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度;(2.2) According to the different types of POIs, the cumulative sum of the activity of various types of POIs is counted as the weighted activity of this type of POI; 偏移率的计算具体为:The calculation of the offset rate is as follows: Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1Offset t =(POIWeight t -Aver week,hour )/(POIWeight t )-1 其中,Averweek,hour为每星期每小时POI带权活跃度均值,POIWeightt为当前小时POI带权活跃度,Offsett为偏移率;Among them, Aver week, hour is the average hourly POI weighted activity per week, POIWeight t is the current hour POI weighted activity, Offset t is the offset rate; 3)相同类型POI聚类:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi;统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值;其中,基于k-means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标;3) The same type of POI clustering: calculate the Euclidean distance around each taxi difficulty distribution point is less than or equal to all POI distribution points in the range of T, and record as POI didi ; count the POI distribution points of the same type in POI didi , calculate the cluster The location of the class center, and the weight of the taxi difficulty distribution point is set as the weight of the cluster center; among them, the k-means-based clustering algorithm clusters the POI distribution points, and a new cluster center will be calculated. The latitude and longitude coordinates are used as the latitude and longitude coordinates of the POI center position; 4)空气质量与居民出行的可视分析,具体为:4) Visual analysis of air quality and residents' travel, specifically: (4.1)颜色视觉编码:对颜色进行映射时,由于空气质量指数AQI的不同,采用动态映射方案,即根据空气质量指数值动态的调整:(4.1) Color visual coding: When mapping colors, due to the difference of AQI, a dynamic mapping scheme is adopted, that is, dynamic adjustment according to the value of AQI:
Figure FDA0002980052260000021
Figure FDA0002980052260000021
其中Colorrect为矩形的填充色;where Color rect is the fill color of the rectangle; (4.2)条形-箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后,矩形的填充色根据步骤4.1的方案确定,高度根据空气质量指数AQI确定;箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数;(4.2) Bar-boxplot analysis component: the daily air quality index is displayed in a rectangle. The order of the rectangles from left to right indicates the order of the daily dates. The fill color of the rectangle is determined according to the plan in step 4.1, and the height is determined according to the air quality index. AQI is determined; the boxplot represents the weekly hourly temperature, the boxplot from left to right represents the order of the weekly dates, the upper dashed line and the lower dashed line represent the upper quarter of the data range and the lower quarter respectively Data range, the small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data; (4.3)流图-堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度,纵坐标是POI带权活跃度值;堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况;流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况;(4.3) Flow Chart - Stacked Chart Analysis Component: The horizontal axis of the stacked chart and the flow chart is the hourly coordinate of the specified time range, with each week as the basic scale, and the vertical axis is the POI weighted activity value; the stacked charts use different colors The area chart represents different types of POIs. The stacked chart is arranged on one side of the coordinate axis, showing the changes in the weighted activity of one or more POIs in the specified time range; the flow chart is arranged along the two sides of the coordinate, showing a specified time range. or changes in the weighted activity of various POIs; (4.4)散点矩阵-GeoMap-日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度;日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下偏移率的变化情况;GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况。(4.4) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is a high-dimensional extension of the scatter map, which is used to display air quality, temperature and POI weighted activity; the calendar heat map converts multi-dimensional data into It is presented in a two-dimensional form, and the color depth is used to represent the size of the value. The calendar heat map shows the change of the offset rate of the same POI under different air quality and temperature conditions; GeoMap is used to display the activity of the same type of POI clustering Weights and geographic distribution.
2.一种基于大数据驱动的空气质量与居民出行可视分析系统,其特征在于,该系统包括以下组件:2. A system for visual analysis of air quality and resident travel driven by big data, characterized in that the system comprises the following components: (1)条形-箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后;矩形的高度根据空气质量指数AQI确定,填充色采用动态映射方案,即根据空气质量指数值动态的调整:(1) Bar-box plot analysis component: the daily air quality index is displayed in a rectangle, and the order of the rectangles from left to right indicates the order of each day; the height of the rectangle is determined according to the air quality index AQI, and the filling color adopts a dynamic mapping scheme , that is, according to the dynamic adjustment of the air quality index value:
Figure FDA0002980052260000031
Figure FDA0002980052260000031
箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数;The boxplot represents the weekly hourly temperature, and the boxplot represents the sequence of the weekly dates from left to right. The upper and lower dashed lines of the boxplot represent the upper quarter data range and the lower quarter data range, respectively. The small rectangle in the center of the boxplot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data; (2)流图-堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度,纵坐标是POI带权活跃度值;堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况;流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况;POI带权活跃度的计算具体为:(2) Flow Chart - Stacked Chart Analysis Component: The abscissa of the stacked chart and the flow chart is the hourly coordinate of the specified time range, with every week as the basic scale, and the ordinate is the POI weighted activity value; the stacked charts use different colors The area chart represents different types of POIs. The stacked chart is arranged on one side of the coordinate axis, showing the changes in the weighted activity of one or more POIs in the specified time range; the flow chart is arranged along the two sides of the coordinate, showing a specified time range. or changes in the weighted activity of various POIs; the calculation of the weighted activity of POIs is as follows: (2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值;(2.1) Calculate the Euclidean distance between the taxi difficulty distribution points and each POI distribution point, and determine whether the Euclidean distance is less than the preset threshold T. If the conditions are met, the weight of the taxi difficulty distribution points is set as The weight of this POI activity; (2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度;(2.2) According to the different types of POIs, the cumulative sum of the activity of various types of POIs is counted as the weighted activity of this type of POI; (3)散点矩阵-GeoMap-日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度;日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下偏移率的变化情况;GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况;(3) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is a high-dimensional extension of the scatter map, which is used to display air quality, temperature and POI weighted activity; the calendar heat map converts multi-dimensional data into It is presented in a two-dimensional form, and the color depth is used to represent the size of the value. The calendar heat map shows the change of the offset rate of the same POI under different air quality and temperature conditions; GeoMap is used to display the activity of the same type of POI clustering Weights and geographic distribution; 偏移率的计算具体为:The calculation of the offset rate is as follows: Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1Offset t =(POIWeight t -Aver week,hour )/(POIWeight t )-1 其中,Averweek,hour为每星期每小时POI带权活跃度均值,POIWeightt为当前小时POI带权活跃度,Offsett为偏移率;Among them, Aver week, hour is the average hourly POI weighted activity per week, POIWeight t is the current hour POI weighted activity, Offset t is the offset rate; 相同类型POI聚类的活跃度权值的计算具体为:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi;统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值;其中,基于k-means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标。The calculation of the activity weight of the same type of POI clustering is as follows: calculating the Euclidean distance around each taxi difficulty distribution point is less than or equal to all POI distribution points within the range of T, denoted as POI didi ; count the same type in POI didi The POI distribution points are calculated, and the position of the cluster center is calculated, and the weight of the taxi difficulty distribution point is set as the weight of the cluster center; among them, the k-means-based clustering algorithm is used to cluster the POI distribution points. The latitude and longitude coordinates of the new cluster center are calculated as the latitude and longitude coordinates of the center of the POI.
CN201710173669.4A 2017-03-22 2017-03-22 Visual analysis method and system for air quality and residents' travel Active CN106991525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710173669.4A CN106991525B (en) 2017-03-22 2017-03-22 Visual analysis method and system for air quality and residents' travel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710173669.4A CN106991525B (en) 2017-03-22 2017-03-22 Visual analysis method and system for air quality and residents' travel

Publications (2)

Publication Number Publication Date
CN106991525A CN106991525A (en) 2017-07-28
CN106991525B true CN106991525B (en) 2021-06-18

Family

ID=59411741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710173669.4A Active CN106991525B (en) 2017-03-22 2017-03-22 Visual analysis method and system for air quality and residents' travel

Country Status (1)

Country Link
CN (1) CN106991525B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7160672B2 (en) * 2018-12-28 2022-10-25 株式会社キーエンス gas flow meter
CN110286663B (en) * 2019-06-28 2021-05-25 云南中烟工业有限责任公司 Improvement method of standardized production of cigarette physical indicators based on regional
CN112699284B (en) * 2021-01-11 2022-08-30 四川大学 Bus stop optimization visualization method based on multi-source data
CN118828254B (en) * 2024-09-11 2024-11-29 贵州桥梁建设集团有限责任公司 Meteorological data transmission and sharing method for highways based on Internet of Things

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7826965B2 (en) * 2005-06-16 2010-11-02 Yahoo! Inc. Systems and methods for determining a relevance rank for a point of interest
US7991561B2 (en) * 2005-09-29 2011-08-02 Roche Molecular Systems, Inc. Ct determination by cluster analysis with variable cluster endpoint
US8669884B2 (en) * 2011-02-02 2014-03-11 Mapquest, Inc. Systems and methods for generating electronic map displays with points of-interest information
WO2014194480A1 (en) * 2013-06-05 2014-12-11 Microsoft Corporation Air quality inference using multiple data sources
CN105679009B (en) * 2016-02-03 2017-12-26 西安交通大学 A kind of call a taxi/order POI commending systems and method excavated based on GPS data from taxi
CN105825672B (en) * 2016-04-11 2019-06-14 中山大学 A city guidance area extraction method based on floating car data

Also Published As

Publication number Publication date
CN106991525A (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN106991525B (en) Visual analysis method and system for air quality and residents' travel
Luo et al. Trans-boundary air pollution in a city under various atmospheric conditions
US9183221B2 (en) Component and method for overlying information bearing hexagons on a map display
CN105095481B (en) Extensive taxi OD data visualization analysis methods
Zheng et al. U-air: When urban air quality inference meets big data
CN110598953A (en) Space-time correlation air quality prediction method
CN110555544B (en) A Traffic Demand Estimation Method Based on GPS Navigation Data
CN110427533A (en) Pollution spread mode visible analysis method and system based on timing Particle tracking
Wadlow et al. Understanding spatial variability of air quality in Sydney: Part 2—A roadside case study
Xu et al. A gradient boost approach for predicting near-road ultrafine particle concentrations using detailed traffic characterization
CN112699284A (en) Bus stop optimization visualization method based on multi-source data
Cummings et al. Mobile monitoring of air pollution reveals spatial and temporal variation in an urban landscape
CN112906941B (en) Forecasting method and system for dynamic related air quality time series
CN102646151A (en) A Variable Window Prospective Spatiotemporal Rearrangement Scanning Algorithm Based on Poisson Distribution
CN113420984A (en) Method for determining air pollution source area
Fitzmaurice et al. Assessing vehicle fuel efficiency using a dense network of CO 2 observations
Chen et al. A spatiotemporal interpolation graph convolutional network for estimating PM₂. ₅ concentrations based on urban functional zones
CN111400877B (en) Intelligent city simulation system and method based on GIS data
Snowdon et al. Spatiotemporal traffic volume estimation model based on GPS samples
US20240183832A1 (en) Detection of volatile organic compounds
MUN et al. Analysis of PM 2.5 Distribution Contribution using GIS Spatial Interpolation-Focused on Changwon-si Urban Area
CN115510056B (en) A data processing system that uses mobile phone signaling data for macroeconomic analysis
US11553318B2 (en) Routing method for mobile sensor platforms
CN111833229B (en) Subway dependency-based travel behavior space-time analysis method and device
Cummings The Spatiotemporal Variation of Air Pollution in Philadelphia, Pennsylvania, and Its Relationship with Urban Structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant