[go: up one dir, main page]

CN107529135A - User Activity type identification method based on smart machine data - Google Patents

User Activity type identification method based on smart machine data Download PDF

Info

Publication number
CN107529135A
CN107529135A CN201610443684.1A CN201610443684A CN107529135A CN 107529135 A CN107529135 A CN 107529135A CN 201610443684 A CN201610443684 A CN 201610443684A CN 107529135 A CN107529135 A CN 107529135A
Authority
CN
China
Prior art keywords
user
activity
stay
mrow
smart device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610443684.1A
Other languages
Chinese (zh)
Inventor
杨超
朱荣荣
许项东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201610443684.1A priority Critical patent/CN107529135A/en
Publication of CN107529135A publication Critical patent/CN107529135A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于智能设备数据的用户活动类型判别方法,包括以下步骤:获取用户智能设备数据;进行数据清洗获得用户的移动轨迹,通过行程识别获得用户的出行链;提取活动开始时间和活动持续时间,并根据停留区段的兴趣点获取所停留区段对应的用地性质;分析用户多天出行的智能设备数据,判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型;除了在家和工作外的其他类型活动,通过将活动特征输入活动分类器中,以分别得到对应的活动类型。本发明不仅能够根据用户的智能设备数据分析出用户的活动类型,且分析过程中涉及到时候阈值或空间阈值均是由传统的用户出行调查数据进行标定,更具使用性,准确性和实时性。

The invention relates to a method for discriminating user activity types based on smart device data, comprising the following steps: acquiring user smart device data; performing data cleaning to obtain the user's movement track, and obtaining the user's travel chain through itinerary identification; extracting the activity start time and activity Duration, and obtain the land use properties corresponding to the stay section according to the points of interest in the stay section; analyze the smart device data of the user's multi-day travel, and judge the location of the user's home and/or work place, so as to obtain the information of whether the user is at home or at work. Activity type: For other types of activities except home and work, the corresponding activity types are obtained by inputting the activity characteristics into the activity classifier. The invention can not only analyze the user's activity type according to the user's smart device data, but also the time threshold or space threshold involved in the analysis process is calibrated by the traditional user travel survey data, which is more usable, accurate and real-time .

Description

基于智能设备数据的用户活动类型判别方法Discrimination method of user activity type based on smart device data

技术领域technical field

本发明涉及用户智能设备数据分析,特别是涉及一种基于智能设备数据的用户活动类型判别方法。The present invention relates to user smart device data analysis, in particular to a method for discriminating user activity types based on smart device data.

背景技术Background technique

城市居民出行活动信息是城市规划、交通管理和居民活动研究的重要依据,一般通过居民出行调查获取。对于居民出行活动信息的获得,传统的调查方法一般包括家访法,电话问询法,明信片方法等。目前,国内主要使用基于入户访问和纸质问卷的调查方式。该方法的弊端十分明显,即受访者负担重、调查精度低、调查成本高、抽样率低等,已无法满足城市快速发展的需求。Urban residents' travel activity information is an important basis for urban planning, traffic management, and residents' activities research, and is generally obtained through residents' travel surveys. For the acquisition of information on residents' travel activities, traditional survey methods generally include home visits, telephone inquiries, and postcards. At present, the survey methods based on household interviews and paper questionnaires are mainly used in China. The disadvantages of this method are obvious, namely heavy burden of respondents, low survey accuracy, high survey cost, low sampling rate, etc., which cannot meet the needs of rapid urban development.

智能设备数据由于其用户的覆盖面广、无需特殊的数据采集设备、数据获取成本较低、数据量大等优点,已经成为目前进行交通大数据研究中不可或缺的数据来源。基于智能设备数据进行用户活动信息提取,可以大幅提高调查的样本量、缩短调查周期、降低调查成本,同时利用智能设备信令数据进行信息获取,是被动式的信息获取方式,不需要对居民进行问卷填写,不增加居民任何负担,从而为交通行业提供更为丰富的基础信息。Smart device data has become an indispensable data source in current traffic big data research due to its wide coverage of users, no need for special data acquisition equipment, low data acquisition cost, and large data volume. Extracting user activity information based on smart device data can greatly increase the sample size of the survey, shorten the survey cycle, and reduce the cost of the survey. At the same time, the use of smart device signaling data for information acquisition is a passive information acquisition method that does not require residents to conduct questionnaires. Filling in will not increase any burden on residents, so as to provide richer basic information for the transportation industry.

目前,从智能设备数据虽然可以得到带时间戳的位置信息从而得到用户一天的移动轨迹,但无法得到用户的社会经济属性信息,也无法直接得到出行方式、活动类型等出行活动属性信息。At present, although time-stamped location information can be obtained from smart device data to obtain the user's daily movement trajectory, it cannot obtain the user's socioeconomic attribute information, nor can it directly obtain travel activity attribute information such as travel mode and activity type.

目前对活动类型判别方法多集中于停驻点识别和职住地识别阶段,仅能识别在“家”和“工作”两类活动类型。具体根据智能设备定位数据的经度和纬度将智能设备定位数据匹配到相应的交通小区内,基于匹配后智能设备定位数据表,提取某用户连续一周的数据,分别统计介于居住地判断时段以及工作地判断时段内在各交通小区出现的次数,并把出现次数最多的交通小区作为该用户的居住地以及工作地。这种只统计停留次数的方法没有考虑停留时长的因素,容易将多次短时停留的地点误判为家或工作地。对于除了“家”和“工作”的其他活动类型的判断方法,目前研究较少且可实时性较差。At present, the identification methods of activity types are mostly focused on the stages of stop point identification and job residence identification, and can only identify the two types of activities in "home" and "work". Specifically, according to the longitude and latitude of the smart device positioning data, the smart device positioning data is matched to the corresponding traffic area. Based on the smart device positioning data table after matching, the data of a user for a continuous week is extracted, and the statistics are respectively calculated between the period of residence and work The number of occurrences in each traffic zone within a time period is accurately judged, and the traffic zone with the largest number of occurrences is used as the user's residence and work place. This method of only counting the number of stays does not take into account the length of stay, and it is easy to misjudge the location of multiple short-term stays as home or work. For the judgment methods of other activity types except "home" and "work", there are few researches and the real-time performance is poor.

发明内容Contents of the invention

基于此,有必要针对用户出行活动类型判别中除了“家”和“工作”的其他活动类型的判断方法,目前研究较少且可实时性较差的问题,提供一种实时性较强的基于智能设备数据的用户活动类型判别方法。Based on this, it is necessary to provide a more real-time based method for judging the types of activities other than "home" and "work" in the identification of user travel activity types. Currently, there are few studies and poor real-time performance. A method for discriminating user activity types from smart device data.

一种基于智能设备数据的用户活动类型判别方法,包括以下步骤:A method for discriminating user activity types based on smart device data, comprising the following steps:

获取用户智能设备数据;Obtain user smart device data;

对所述智能设备数据进行数据清洗获得用户的移动轨迹,通过行程识别划分用户停留区段和出行区段,获得用户的出行链;Perform data cleaning on the smart device data to obtain the user's movement trajectory, and divide the user's stay section and travel section through itinerary identification to obtain the user's travel chain;

提取所述出行链中各所述停留区段的活动开始时间和活动持续时间,并根据所述停留区段的兴趣点获取所述停留区段对应的用地性质;Extracting the activity start time and activity duration of each of the stay sections in the travel chain, and obtaining the land use properties corresponding to the stay sections according to the points of interest of the stay sections;

分析用户预设时间出行的智能设备数据,获得各用户对应的停留时段,停留时长,以及停留次数,以此判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型;Analyze the smart device data of the user's travel at the preset time, and obtain the corresponding stay period, length of stay, and number of stays of each user, so as to determine the location of the user's home and/or work place, and obtain two types of activities at home or at work;

通过将除在家或在工作两种活动类型外的停留区段所对应的活动开始时间、活动持续时间,以及用地性质输入活动分类器中,以分别得到各用户除了在家或在工作两种活动类型外预定义的活动类型。By inputting the activity start time, activity duration, and land use properties corresponding to the stay segments other than the two types of activities at home or at work into the activity classifier, the two types of activities of each user except at home or at work can be obtained outside the predefined activity types.

在其中一个实施例中,所述方法还包括:构建所述活动分类器,包括:In one of the embodiments, the method also includes: constructing the activity classifier, including:

基于传统的用户出行调查数据,提取所述调查数据各用户的停留区段的活动开始时间,活动持续时间,以及停留区段对应的用地性质构建基于决策树的活动分类器。Based on the traditional user travel survey data, the activity start time, activity duration, and land use properties corresponding to the stay section of each user in the survey data are extracted to construct an activity classifier based on a decision tree.

在其中一个实施例中,所述数据清洗包括字段缺失处理,删除异常的IMSI编号记录,删去无法与所述基站定位数据相匹配的记录,删去重复数据,乒乓效应处理,以及信号漂移处理。In one embodiment, the data cleaning includes field missing processing, deleting abnormal IMSI number records, deleting records that cannot match the base station positioning data, deleting duplicate data, ping-pong effect processing, and signal drift processing .

在其中一个实施例中,所述乒乓效应处理包括以下步骤:将每位用户的所述智能设备数据按空间和时间将区域合并,若用户信号在小于空间阈值L1的范围内波动,且超过时间阈值T1,则认为用户在这段时间内处于同一位置。In one of the embodiments, the ping-pong effect processing includes the following steps: merging the smart device data of each user into areas according to space and time, if the user signal fluctuates within a range less than the spatial threshold L1 and exceeds the time limit Threshold T1, it is considered that the user is at the same location during this period.

在其中一个实施例中,所述信号漂移处理包括以下步骤:将每位用户的所述智能设备数据按空间和时间将区域合并,若用户在时间阈值T2内离开空间阈值L2,之后又返回到所述空间阈值L2内,则认为用户是处于同一个位置。In one of the embodiments, the signal drift processing includes the following steps: merging the smart device data of each user into areas according to space and time, if the user leaves the space threshold L2 within the time threshold T2, and then returns to Within the space threshold L2, the users are considered to be in the same location.

在其中一个实施例中,所述行程识别包括以下步骤:若用户在时间阈值Tstay内的轨迹点群聚于Lstay的半径范围内,或者在所述时间阈值Tstay内的移动速度低于速度阈值Vstay,则所对应的区段为停留区段,否则为出行区段。In one embodiment, the itinerary identification includes the following steps: if the trajectory points of the user within the time threshold T stay are clustered within the radius of L stay , or the moving speed within the time threshold T stay is lower than If the speed threshold V stay , the corresponding segment is a stay segment, otherwise it is a travel segment.

在其中一个实施例中,还包括以下步骤:In one of the embodiments, the following steps are also included:

将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户数据;Convert the travel data in the user travel survey data into activity chain data, and filter the data of users who leave home in the morning and go home in the evening;

提取所有类型活动的活动持续时间,建立活动持续时间分布,将活动持续时间分布的第p%分位数作为行程识别的时间阈值Tstay,p为5至10中任意一自然数。Extract the activity duration of all types of activities, establish the activity duration distribution, and use the p% quantile of the activity duration distribution as the time threshold T stay for trip identification, where p is any natural number from 5 to 10.

在其中一个实施例中,所述根据所述停留区段的兴趣点获取所述停留区段对应的用地性质包括以下步骤:In one of the embodiments, the acquisition of the land use properties corresponding to the stay section according to the points of interest in the stay section includes the following steps:

通过时间加权的方式计算停留位置的中心坐标;Calculate the center coordinates of the stop position by time weighting;

根据所述中心坐标对应的兴趣点的位置、数量建立核密度估计模型,其公式如下:According to the position and quantity of the point of interest corresponding to the center coordinates, a kernel density estimation model is established, and its formula is as follows:

K(.)表示核函数;K(.) represents the kernel function;

r表示窗宽;r represents the window width;

n表示兴趣点总数;n represents the total number of points of interest;

di,s表示所述中心坐标到各兴趣点s的距离;d i, s represents the distance from the center coordinates to each point of interest s;

选择高斯函数作为核函数:Choose a Gaussian function as the kernel function:

计算所述停留位置处不同兴趣点类型的核密度值,取核密度值最高的兴趣点对应的用地性质作为停留位置的用地性质。Calculate the kernel density values of different interest point types at the stay position, and take the land use property corresponding to the interest point with the highest kernel density value as the land use property of the stay position.

在其中一个实施例中,所述分析用户预设时间出行的智能设备数据,获得对应的停留时段,停留时长,以及停留次数,以此判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型,包括以下步骤:In one of the embodiments, the analysis of the smart device data of the user's trip at the preset time obtains the corresponding stay period, length of stay, and number of stays, so as to determine the location of the user's home and/or work place, and obtain the home or Working on two activity types consists of the following steps:

提取每个用户的所有所述活动开始时间,所述活动持续时间,以及用地性质;extracting all of said activity start times, said activity durations, and land use properties for each user;

筛选出每个用户工作日的数据,统计得总天数为NFilter out the data of each user's working day, and count the total number of days as N

对于每一类停留位置,统计夜间停留时间大于Thome的总天数NhomeFor each type of stay position, count the total number of days N home where the night stay time is greater than T home ;

若Nhome大于第一判断时间阈值,则该位置为家庭所在地。否则,统计在工作时段停留时间大于Twork的总天数NworkIf N home is greater than the first judgment time threshold, the location is the home location. Otherwise, count the total number of days N work in which the stay time in the working period is greater than T work ;

若Nwork大于第二判断时间阈值,则该位置为工作地。If N work is greater than the second judging time threshold, the location is a working location.

在其中一个实施例中,还包括以下步骤:还包括以下步骤:In one of the embodiments, further comprising the following steps: further comprising the following steps:

将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户智能设备数据;Convert the travel data in the user travel survey data into activity chain data, and filter the smart device data of users who leave home in the morning and return home in the evening;

提取夜间停留地点为家的活动,建立活动持续时间分布,将活动持续时间分布的第a%分位数家庭所在地识别的时间阈值Thome,a为5至10中任意一自然数;Extract the activities where the place of stay at night is home, establish the distribution of activity duration, and identify the time threshold T home of the a% quantile family location of the distribution of activity duration, where a is any natural number from 5 to 10;

提取工作时段停留地点为工作地的活动,建立活动持续时间分布,将活动持续时间分布的第b%分位数作为工作地识别的时间阈值Twork,b为5至10中任意一自然数。Extract the activities where the place of stay during the working period is the work place, establish the activity duration distribution, and use the b% quantile of the activity duration distribution as the time threshold T work for work place identification, where b is any natural number from 5 to 10.

本发明提供的基于智能设备数据的用户活动属性判别方法不仅能够从用户的智能设备数据分析得到用户的职住地,还能对除在家和在工作之外的活动类型做较准确的判断和统计,且其他类型活动的判定是基于传统的用户出行调查数据提取了相关活动特征,构建了基于决策树的活动分类器,对除在家和在工作之外的活动类型进行判断,并且行程识别的时间阈值Tstay,家庭所在地识别的时间阈值Thome,以及工作地识别的时间阈值Twork均是基于传统的用户出行调查数据分析提取出来的,起到标定的作用,更具使用性,准确性和实时性。The method for discriminating user activity attributes based on smart device data provided by the present invention can not only analyze the user's smart device data to obtain the user's job and residence, but also make more accurate judgments and statistics on the types of activities other than at home and at work. And the judgment of other types of activities is based on the traditional user travel survey data to extract relevant activity characteristics, build an activity classifier based on a decision tree, and judge the types of activities other than home and work, and the time threshold for travel recognition T stay , the time threshold T home for home location identification, and the time threshold T work for work location identification are all extracted based on the analysis of traditional user travel survey data, which play a role in calibration and are more usable, accurate and real-time sex.

附图说明Description of drawings

图1为本发明一实施例提供的基于智能设备数据的用户活动类型判别方法的一流程图;Fig. 1 is a flow chart of a user activity type discrimination method based on smart device data provided by an embodiment of the present invention;

图2为本发明一实施例提供的基于智能设备数据的用户活动类型判别方法的另一流程图;FIG. 2 is another flow chart of a method for discriminating user activity types based on smart device data according to an embodiment of the present invention;

图3为本发明一实施例中用户职住地的判断流程图;Fig. 3 is a flow chart of judging the user's occupation and residence in an embodiment of the present invention;

图4为本发明2009年上海传统的用户出行调查数据所有类型活动持续时间累计频率分布图;Fig. 4 is the accumulative frequency distribution diagram of all types of activity durations of traditional user travel survey data in Shanghai in 2009 of the present invention;

图5为本发明2009年上海传统的用户出行调查数据所有夜间在家活动持续时间累计频率分布图;Fig. 5 is the accumulative frequency distribution figure of all the activity durations at home at night in Shanghai's traditional user travel survey data in 2009 of the present invention;

图6为本发明2009年上海传统的用户出行调查数据所有工作活动持续时间累计频率分布图;Fig. 6 is the accumulative frequency distribution diagram of all work activity durations of the traditional user travel survey data in Shanghai in 2009 of the present invention;

图7为本发明2009年上海传统的用户出行调查数据其他活动类型判别的规则示意图。FIG. 7 is a schematic diagram of the rules for identifying other activity types of traditional user travel survey data in Shanghai in 2009 according to the present invention.

具体实施方式detailed description

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

图1为本发明提供的基于智能设备数据的用户活动类型判别方法流程图,首先整体讲解下本发明提供的基于智能设备数据的用户活动属性判别方法的运行流程。首先从智能设备网络运营商,智能设备APP,以及其他公开资源获得一个城市或者一个区域的所有用户的智能设备数据。然后对智能设备数据进行数据清洗,数据整理划分,之后提取活动特征以判断用户的活动类型是在家还是在工作,如若判断用户的活动类型不是在家也不是在工作,则将活动特征输入到活动分类器中,以判定其他具体的活动类型。对智能设备数据进行数据清洗,数据整理划分过程中应用到的出行/停留划分时间阈值,在职住地判断过程中应用到的在家活动持续时间阈值和工作地活动持续时间阈值均由2009年上海传统的用户出行调查数据分析获得;另外,活动分类器也是基于2009年上海传统的用户出行调查数据进行构建,这些下文会详细介绍。具体智能设备可为手机,Ipad、穿戴式设备等。Fig. 1 is a flow chart of the method for discriminating user activity types based on smart device data provided by the present invention. Firstly, the operation flow of the method for discriminating user activity attributes based on smart device data provided by the present invention is generally explained. First, obtain the smart device data of all users in a city or a region from smart device network operators, smart device APPs, and other public resources. Then clean the smart device data, organize and divide the data, and then extract the activity features to determine whether the user's activity type is at home or work. If it is determined that the user's activity type is neither at home nor at work, then input the activity features into the activity classification to determine other specific types of activity. Data cleaning is carried out on smart device data. The travel/stay division time thresholds used in the data sorting and division process, and the home activity duration thresholds and workplace activity duration thresholds used in the on-the-job and residence judgment process are all determined by Shanghai’s traditional 2009 threshold. The user travel survey data analysis is obtained; in addition, the activity classifier is also constructed based on the traditional user travel survey data in Shanghai in 2009, which will be introduced in detail below. Specific smart devices can be mobile phones, Ipads, wearable devices, etc.

同时参见图2,一实施例中,本发明提供的基于智能设备数据的用户活动属性判别方法包括以下步骤:Referring to Fig. 2 at the same time, in one embodiment, the method for discriminating user activity attributes based on smart device data provided by the present invention includes the following steps:

S202,获取用户智能设备数据。S202. Obtain user smart device data.

本发明中的智能设备数据,根据数据来源可以大致分为两类。一类是从智能设备网络运营商(中国移动,中国联通,中国电信)获得的智能设备话单数据和智能设备信令数据。数据的主要内容包括:经匿名处理的用户标识码,基站的小区编号,事件类型,事件发生的时刻。智能设备话单数据和智能设备信令数据的不同主要体现在事件类型上,智能设备话单数据主要包括主叫、被叫、硬切换、发短信、接短信等,而信令数据除了具有话单数据的事件外,还包括了开机、关机、小区切换、位置更新等。话单数据是信令数据的子集,信令数据是具有更高用户位置采样率的数据。通过基站的经纬度信息可以将用户在移动网络中的移动映射到实际的地理系统中。另一类是从带轨迹记录功能的智能设备APP应用(如GeoLife,Bikely,SportsDo)中获得的用户移动轨迹数据,数据的主要内容包括采样时刻和经纬度等信息。以深圳市智能设备话单数据为例,数据字段通常包括用户标识码、小区标识,扇区标识,接入时刻等,具体如表1。The smart device data in the present invention can be roughly divided into two types according to the data sources. One type is smart device bill data and smart device signaling data obtained from smart device network operators (China Mobile, China Unicom, China Telecom). The main contents of the data include: anonymized user identification code, cell number of the base station, event type, and time when the event occurred. The difference between smart device bill data and smart device signaling data is mainly reflected in the event type. Smart device bill data mainly includes calling, In addition to single-data events, it also includes power-on, power-off, cell handover, location update, etc. Bill data is a subset of signaling data, and signaling data is data with a higher user location sampling rate. The longitude and latitude information of the base station can map the user's movement in the mobile network to the actual geographic system. The other type is user movement trajectory data obtained from smart device APP applications with trajectory recording functions (such as GeoLife, Bikely, SportsDo). The main content of the data includes sampling time, latitude and longitude and other information. Taking Shenzhen smart device bill data as an example, the data fields usually include user identification code, cell identification, sector identification, access time, etc., as shown in Table 1.

表1Table 1

S204,对智能设备数据进行数据清洗获得用户的移动轨迹,通过行程识别划分用户停留区段和出行区段,获得用户的出行链。S204. Perform data cleaning on the smart device data to obtain the user's movement trajectory, and divide the user's stay section and travel section through travel identification to obtain the user's travel chain.

具体的,一实施例中,对智能设备数据进行数据清洗包括字段缺失处理,删除异常的IMSI(国际移动用户识别码)编号记录,删去无法与所述基站定位数据相匹配的记录,删去重复数据,乒乓效应处理,以及信号漂移处理。Specifically, in one embodiment, the data cleaning of smart device data includes field missing processing, deleting abnormal IMSI (International Mobile Subscriber Identity) number records, deleting records that cannot match the base station positioning data, and deleting Duplicate data, ping-pong effect handling, and signal drift handling.

字段缺失处理是指将智能设备数据中一些关键字段信息缺失的记录删除,比如某些基站编号为0,时间项缺失等;Missing field processing refers to deleting records with missing key field information in smart device data, such as the number of some base stations being 0, missing time items, etc.;

删去异常的IMSI编号的记录。由于存储过程的一些异常,可能会导致一些IMSI编号的产生。Delete the record of the abnormal IMSI number. Due to some abnormalities in the stored procedure, some IMSI numbers may be generated.

删去无法与基站数据相匹配的记录。本实施例的研究是在上海范围内,由于信号问题,一些记录可能会定位到相邻省份的基站中去,如果出现邻省的基站数据,则删除相关数据。Records that could not be matched to base station data were deleted. The research of this embodiment is within the scope of Shanghai. Due to signal problems, some records may be located in the base stations of neighboring provinces. If there are base station data of neighboring provinces, the relevant data will be deleted.

删去重复数据。实施中除了一些真正的重复数据外,还可能由于精度问题(比如时间项,精确到秒后一些本来不是同一时间的记录也得到的相同的精确到秒的结果)的导致的记录重复。Remove duplicate data. In addition to some real duplicate data in the implementation, there may also be duplication of records due to accuracy issues (such as time items, some records that are not originally at the same time after being accurate to seconds will also get the same result accurate to seconds).

具体的,一实施例中的乒乓效应处理包括以下步骤:将每位用户的智能设备数据按空间和时间将区域合并,若用户信号在小于空间阈值L1的范围内波动,且超过时间阈值T1,则认为用户在这段时间内处于同一位置。更具体的,空间阈值L1为400-500米的直径范围,时间阈值T1为25-30分钟。Specifically, the ping-pong effect processing in one embodiment includes the following steps: merge the smart device data of each user into areas according to space and time, if the user signal fluctuates within a range smaller than the spatial threshold L1 and exceeds the time threshold T1, Then the user is considered to be in the same location during this time. More specifically, the space threshold L1 is a diameter range of 400-500 meters, and the time threshold T1 is 25-30 minutes.

具体的,一实施例中的信号漂移处理包括以下步骤:将每位用户的智能设备数据按空间和时间将区域合并,若用户在时间阈值T2内离开空间阈值L2,之后又返回到所述空间阈值L2内,则认为用户是处于同一个位置。用户的智能设备数据记录短时间内离开上述小的空间范围L2,之后又很快返回的情况,也认为用户是处于同一个位置。更具体的是指信号离开上述区域和回到上述区域的位置切换速度大于100km/h(城市快速路设计速度上限)且离开上述区域的时间不超过Tclean。更具体的,空间阈值L2为400-500米的直径范围,时间阈值T2为25-30分钟。Specifically, the signal drift processing in one embodiment includes the following steps: merge the smart device data of each user into areas according to space and time, if the user leaves the space threshold L2 within the time threshold T2, and then returns to the space Within the threshold L2, it is considered that the users are in the same location. If the data record of the user's smart device leaves the above-mentioned small space range L2 for a short period of time, and then returns quickly, the user is also considered to be in the same location. More specifically, it means that the position switching speed of the signal leaving the above-mentioned area and returning to the above-mentioned area is greater than 100km/h (upper limit of urban expressway design speed) and the time of leaving the above-mentioned area does not exceed T clean . More specifically, the spatial threshold L2 is a diameter range of 400-500 meters, and the time threshold T2 is 25-30 minutes.

具体的,一实施例中,对用户的移动轨迹进行行程识别包括以下步骤:若用户在时间阈值Tstay内的轨迹点群聚于Lstay的半径范围内,或者在时间阈值Tstay内的移动速度低于速度阈值Vstay,则所对应的区段为停留区段,否则为出行区段。具体的,参见图4,一实施例中,将2009年上海传统的用户出行调查数据转化为活动链数据,筛选早上从家出发且晚上回家的用户数据。然后提取所有类型活动的活动持续时间,建立活动持续时间分布,将活动持续时间分布的第p%分位数作为行程识别的时间阈值Tstay,p为5至10中任意一自然数。本实施例中,活动持续时间分布的第p%分位数指的是基于2009年上海传统的用户出行调查数据,将智能设备用户所有活动时间按照从小到大的顺序进行排序,一共有n个,n*p%=m,则排序中第m个排对应的活动持续时间即为第p%分位数的值,如果计算得m不为整数,例如为12.3,则取排序中第12个和第13个对应的时间值的平均值作为第p%分位数的值。本实施中取活动持续时间分布的第5%分位数,25分钟作为行程识别的时间阈值Tstay。在其他实施例中,Tstay可以取活动持续时间分布的第6%或7%或8%或9%或10%分位数对应的时间值均可,具体看实际应用情况而定。更具体的,速度阈值Vstay为1m/s,Lstay为200-500米,时间阈值Tstay为5-25分钟,具体Lstay和Tstay的数值点需要结合整个活动链和具体实际情况综合考虑。Specifically, in one embodiment, the identification of the user's movement trajectory includes the following steps: If the user's trajectory points within the time threshold T stay are clustered within the radius of L stay , or the user's movement within the time threshold T stay If the speed is lower than the speed threshold V stay , the corresponding section is a stay section, otherwise it is a travel section. Specifically, referring to FIG. 4 , in one embodiment, the traditional user travel survey data in Shanghai in 2009 is converted into activity chain data, and the data of users who leave home in the morning and return home in the evening are screened. Then extract the activity duration of all types of activities, establish the activity duration distribution, and use the p% quantile of the activity duration distribution as the time threshold T stay for trip identification, where p is any natural number from 5 to 10. In this embodiment, the p% quantile of the activity duration distribution refers to sorting all activity times of smart device users in ascending order based on the traditional user travel survey data in Shanghai in 2009, and there are a total of n , n*p%=m, then the activity duration corresponding to the mth row in the sorting is the value of the p% quantile. If the calculated m is not an integer, for example, 12.3, then take the 12th in the sorting The mean of the time values corresponding to the 13th is taken as the value of the p% quantile. In this implementation, the 5th percentile of the activity duration distribution is taken, and 25 minutes is used as the time threshold T stay for travel identification. In other embodiments, T stay may be any time value corresponding to the 6th, 7th, 8th, 9th, or 10th percentile of the activity duration distribution, depending on actual application conditions. More specifically, the speed threshold V stay is 1m/s, the L stay is 200-500 meters, and the time threshold T stay is 5-25 minutes. The specific numerical points of L stay and T stay need to be combined with the entire activity chain and specific actual conditions consider.

S206,提取出行链中各停留区段的活动开始时间和活动持续时间,并根据停留区段的兴趣点获取停留区段对应的用地性质。具体一实施例中,获取停留区段对应的用地性质包括以下步骤:S206. Extract the activity start time and activity duration of each stay section in the travel chain, and obtain the land use properties corresponding to the stay section according to the points of interest in the stay section. In a specific embodiment, obtaining the land use property corresponding to the stay section includes the following steps:

通过时间加权的方式计算停留位置的中心坐标。具体包括以下步骤:Calculate the center coordinates of the stop position by time weighting. Specifically include the following steps:

首先,依次计算同一智能设备用户候选停留位置中连续两静止点pi,pi+1的平均坐标 First, calculate the average coordinates of two consecutive stationary points p i and p i+1 in the candidate stay position of the same smart device user in turn

pi·x——静止点pi经度坐标;p i x—longitude coordinates of static point p i ;

pi·y——静止点pi纬度坐标;p i y - the latitude coordinates of the stationary point p i ;

接着,将两静止点间的时间间隔Δt(i,i+1)与整个候选停留位置的时长s.Δt的比值作为平均坐标的权重:Then, the ratio of the time interval Δt (i, i+1) between the two static points to the duration s.Δt of the entire candidate stay position is used as the weight of the average coordinate:

Δt(i,i+1)——两静止点pi,pi+1之间的时间间隔;Δt (i, i+1) ——the time interval between two stationary points p i , p i+1 ;

s.Δt——整个候选停留位置的停留时长;s.Δt——the duration of the entire candidate stay position;

最后,通过加权相加计算候选停留位置的中心坐标(s.x,s.y):Finally, the center coordinates (s.x, s.y) of the candidate stop positions are calculated by weighted addition:

(s.x,s.y)即为候选停留位置s的中心坐标。(s.x, s.y) is the center coordinate of the candidate stop position s.

根据中心坐标对应的兴趣点的位置、数量建立核密度估计模型,其公式如下:According to the position and quantity of the interest points corresponding to the center coordinates, the kernel density estimation model is established, and the formula is as follows:

K(.)表示核函数;K(.) represents the kernel function;

r表示窗宽;r represents the window width;

n表示兴趣点总数;n represents the total number of points of interest;

di,s表示中心坐标到各兴趣点s的距离;d i, s represents the distance from the center coordinates to each point of interest s;

选择高斯函数作为核函数:Choose a Gaussian function as the kernel function:

计算停留位置处不同兴趣点类型的核密度值,取核密度值最高的兴趣点对应的用地性质作为停留位置的用地性质。Calculate the kernel density values of different types of interest points at the stay position, and take the land use property corresponding to the interest point with the highest kernel density value as the land use property of the stay position.

S208,分析用户预设时间出行的智能设备数据,获得各用户对应的停留时段,停留时长,以及停留次数,以此判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型。具体的,本实施中的预设时间为多天,在其他实施例中预设时间还可以是一段时间、半个月、一个月、几个月或者一年,几年,具体视实际情况而定。S208, analyze the smart device data of the user's travel at the preset time, obtain the corresponding stay period, stay duration, and number of stays of each user, so as to judge the location of the user's home and/or work place, and obtain two activities at home or at work Types of. Specifically, the preset time in this implementation is multiple days, and in other embodiments, the preset time can also be a period of time, half a month, a month, a few months or a year, or several years, depending on the actual situation. Certainly.

参见图3,一实施例中,具体的步骤如下:Referring to Figure 3, in one embodiment, the specific steps are as follows:

提取每个用户的所有活动开始时间,活动持续时间,以及用地性质;筛选出每个用户工作日的数据,统计得总天数为N;Extract all the activity start time, activity duration, and land use properties of each user; filter out the data of each user's working day, and count the total number of days as N;

对于每一类停留位置,统计夜间停留时间大于Thome的总天数NhomeFor each type of stay position, count the total number of days N home where the night stay time is greater than T home ;

若Nhome大于第一判断时间阈值,则该位置为家庭所在地。否则,统计在工作时段停留时间大于Twork的总天数NworkIf N home is greater than the first judgment time threshold, the location is the home location. Otherwise, count the total number of days N work in which the stay time in the working period is greater than T work ;

若Nwork大于第一判断时间阈值,则该位置为工作地;否则,该位置为其他活动目的地。具体的,夜间指的是20:00-次日7:00;工作时间段指的是9:00-17:00。本实施中第一判断时间阈值和第二时间阈值均为总天数N的60%,在其他实施例中根据实际情况其他百分比。If N work is greater than the first judging time threshold, the location is a work place; otherwise, the location is a destination for other activities. Specifically, nighttime refers to 20:00-7:00 the next day; working hours refers to 9:00-17:00. In this implementation, both the first judgment time threshold and the second time threshold are 60% of the total number of days N, and other percentages are determined according to actual conditions in other embodiments.

更具体的,本实施例中的家庭所在地识别的时间阈值Thomme是以2009年上海传统的用户出行调查数据为样本,将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户智能设备数据;提取夜间20:00-次日7:00停留地点为家的活动,建立活动持续时间分布,将活动持续时间分布的a%分位数家庭所在地识别的时间阈值Thome,a为5至10中任意一自然数。本实施例中,活动持续时间分布的第a%分位数指的是基于2009年上海传统的用户出行调查数据,将智能设备用户所有活动时间按照从小到大的顺序进行排序,一共有n个,n*a%=m,则排序中第m个排对应的活动持续时间即为第a%分位数的值,如果计算得m不为整数,例如为12.3,则取排序中第12个和第13个对应的时间值的平均值作为第a%分位数的值。具体的,参见图5,取活动持续时间分布的第5%分位数为540分钟,作为家庭所在地识别的时间阈值Thome,也就是在家待的时间为9小时。在其他实施例中,Thome可以取活动持续时间分布的第6%或7%或8%或9%或10%分位数对应的时间值均可,具体看实际应用情况而定。More specifically, the time threshold T homme for family location identification in this embodiment is based on the traditional user travel survey data in Shanghai in 2009 as a sample. The smart device data of users who leave and go home at night; extract the activities that stay at home from 20:00 at night to 7:00 the next day, establish the activity duration distribution, and identify the a% quantile family location of the activity duration distribution The time threshold T home , a is any natural number from 5 to 10. In this embodiment, the a% quantile of the activity duration distribution refers to sorting all the activity times of smart device users in ascending order based on the traditional user travel survey data in Shanghai in 2009, and there are a total of n , n*a%=m, then the activity duration corresponding to the mth row in the sorting is the value of the a% quantile. If the calculated m is not an integer, for example, 12.3, take the 12th in the sorting The average of the time values corresponding to the 13th is taken as the value of the a% quantile. Specifically, referring to FIG. 5 , the 5th percentile of the activity duration distribution is taken as 540 minutes, which is used as the time threshold T home for family location identification, that is, the time spent at home is 9 hours. In other embodiments, T home may be any time value corresponding to the 6th, 7th, 8th, 9th, or 10th percentile of the activity duration distribution, depending on actual application conditions.

工作地识别的时间阈值Twork以2009年上海传统的用户出行调查数据为样本,将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户智能设备数据;提取工作时段9:00-17:00停留地点为工作地的活动,建立活动持续时间分布,将活动持续时间分布的b%分位数作为工作地识别的时间阈值Twork,b为5至10中任意一自然数。本实施例中,活动持续时间分布的第b%分位数指的是基于2009年上海传统的用户出行调查数据,将智能设备用户所有活动时间按照从小到大的顺序进行排序,一共有n个,n*b%=m,则排序中第m个排对应的活动持续时间即为第b%分位数的值,如果计算得m不为整数,例如为12.3,则取排序中第12个和第13个对应的时间值的平均值作为第b%分位数的值。具体的,本实施例中,参见图6,以2009年上海传统的用户出行调查数据为样本,活动持续时间分布的第5%分位数为165分钟,也就是在某个地方待的时间为2个多小时,接近3小时。一般正常工作的时间是7-8小时,取165分钟作为该时间阈值是因为有一部分的工作可能不是长时间在一个地方坐班,比如公司的管理者,或者教师等职业,他们在一个地方的工作时间就是2-3个小时。同时,我们除了参考时间阈值165分钟,还会考虑停留的次数,停留的位置是否一致以排除像偶尔的购物或外出吃饭的情况。在其他实施例中,Twork可以取活动持续时间分布的第6%或7%或8%或9%或10%分位数对应的时间值均可,具体看实际应用情况而定。The time threshold T work for workplace identification takes the traditional user travel survey data in Shanghai in 2009 as a sample, converts the travel data in the user travel survey data into activity chain data, and screens the smart device data of users who leave home in the morning and return home in the evening ;Extract the activities that stay at the work place during the working period 9:00-17:00, establish the activity duration distribution, and use the b% quantile of the activity duration distribution as the time threshold T work for the work place identification, b is 5 to Any one of 10 natural numbers. In this embodiment, the b% quantile of the activity duration distribution refers to sorting all activity times of smart device users in ascending order based on the traditional user travel survey data in Shanghai in 2009, and there are a total of n , n*b%=m, then the activity duration corresponding to the mth row in the sorting is the value of the b% quantile. If the calculated m is not an integer, such as 12.3, take the 12th in the sorting The average of the time values corresponding to the 13th is taken as the value of the b% quantile. Specifically, in this embodiment, referring to FIG. 6, taking the traditional user travel survey data in Shanghai in 2009 as a sample, the 5th percentile of the activity duration distribution is 165 minutes, that is, the time spent in a certain place is More than 2 hours, close to 3 hours. Generally, the normal working time is 7-8 hours, and 165 minutes is taken as the time threshold because some jobs may not be in one place for a long time, such as company managers, or teachers, etc., they work in one place The time is 2-3 hours. At the same time, in addition to the reference time threshold of 165 minutes, we will also consider the number of stays and whether the location of the stay is consistent to rule out occasional shopping or eating out. In other embodiments, T work may be any time value corresponding to the 6th, 7th, 8th, 9th, or 10th percentile of the activity duration distribution, depending on actual application conditions.

另外,可以理解的是,活动类型的上学的,它的活动特征与工作在差不多的,因此上学这一类的活动判断,已并入职住地判断这一块。In addition, it is understandable that the activity characteristics of the activity type of going to school are similar to that of work, so the judgment of activities such as going to school has been incorporated into the judgment of work and residence.

S210,通过将除在家或在工作两种活动类型外的停留区段所对应的活动开始时间、活动持续时间,以及用地性质输入活动分类器中,以分别得到各用户除了在家或在工作两种活动类型外预定义的活动类型。预定义的活动类型是指预先根据统计需求定义的活动类型。例如预定义的活动类型可包括上班、上学、购物、文化娱乐、业务、接送人、回家等,不限于此。S210, by inputting the activity start time, activity duration, and land use properties corresponding to the stay segments other than the two types of activities at home or at work into the activity classifier, to obtain the two types of activities for each user except at home or at work. Activity types other than predefined ones. Predefined activity types refer to activity types defined in advance according to statistical requirements. For example, the predefined activity types may include going to work, going to school, shopping, culture and entertainment, business, picking up people, going home, etc., but are not limited thereto.

构建活动分类器的步骤如下:基于2009年上海传统的用户出行调查数据,提取上述调查数据各停留区段的活动开始时间,活动持续时间,以及停留区段对应的用地性质构建基于决策树的活动分类器。具体的,用地性质先对应到2009年上海传统的用户出行调查数据归纳出来的停留点类别,然后结合活动开始时间,活动持续时间输入活动分类器,即可得到具体的其他活动类型。活动分类器经过训练之后的判断规则如图7,下文将做详细介绍。用户出行调查在各个城市都会定期举行,其采集的信息较为完整,已涵盖本发明中模型标定所需的输入特征数据。虽然抽样率不高(一般为城市总人口的1%),但样本数已完全可以满足模型训练和标定的需要。本发明实施例使用2009年上海市用户出行调查数据构建活动分类器模型。在其他实施例中也可以用其他城市或者其他年份的用户出行调查数据构建活动分类器模型。The steps to build an activity classifier are as follows: Based on the traditional user travel survey data in Shanghai in 2009, extract the activity start time, activity duration, and land use properties corresponding to the stay sections of the above survey data to construct an activity based on a decision tree Classifier. Specifically, the nature of land use first corresponds to the category of stay points summarized from the traditional user travel survey data in Shanghai in 2009, and then combined with the activity start time and activity duration input into the activity classifier, other specific activity types can be obtained. The judgment rules of the activity classifier after training are shown in Figure 7, which will be introduced in detail below. User travel surveys are held regularly in various cities, and the collected information is relatively complete, covering the input characteristic data required for model calibration in the present invention. Although the sampling rate is not high (generally 1% of the total urban population), the number of samples can fully meet the needs of model training and calibration. In the embodiment of the present invention, an activity classifier model is constructed by using the survey data of user trips in Shanghai in 2009. In other embodiments, user travel survey data of other cities or years may also be used to construct an activity classifier model.

具体的,用地性质与停留点的对应关系如表2:Specifically, the corresponding relationship between land use properties and stay points is shown in Table 2:

表2Table 2

通过对2009年上海传统的用户出行调查数据统计,分析得到活动类型包括:①上班、②上学、③购物、④文化娱乐、⑤业务、⑥接送人、⑦回家、⑧其他生活。Through the statistics of Shanghai traditional user travel survey data in 2009, the analysis shows that the types of activities include: ①going to work, ②going to school, ③shopping, ④cultural entertainment, ⑤business, ⑥pick-up, ⑦going home, and ⑧other life.

现结合图7,同时参见表2和以上八类活动类型详细阐述活动分类器的判断规则。具体的,图7中GIS2为用地性质对应的停留点类别、dur为活动持续时间、startTime为活动开始时间。首先,将停留点类别GIS2输入,若GIS2=3,停留点类别为商场、店铺,则活动类型为③购物,判断完毕。若GIS2不等于3,则继续判断是否是其他的停留类别,例如图7,继续判断,若GIS2=8,停留点类别为娱乐场所、游览场所、展览馆、体育场馆,则活动类型为④文化娱乐。若GIS2不等于8,则继续判断其他类别,若GIS2=7或12,进一步判断活动持续时间dur是否小于38分钟,若是则进一步判断该活动的开始时间startTime是否小于432分钟(以零点,即夜里12点为起点),若是该活动类型为④文化娱乐,若不是,进一步判断startTime是否小于1040分钟,若是该活动类型为⑥接送人,若不是,该活动类型为⑧其他生活。如果开始时间startTime不是小于432分钟,则停留点类别GIS2为火车站、长途汽车站、码头、空港等对外交通枢纽,同时进一步判断该活动的开始时间是否小于488分钟,如果是,该活动类型为⑥接送人,否则为⑧其他生活。Now in conjunction with Figure 7, refer to Table 2 and the above eight types of activity types to elaborate on the judgment rules of the activity classifier. Specifically, GIS2 in Fig. 7 is the type of stay point corresponding to the nature of the land use, dur is the duration of the activity, and startTime is the start time of the activity. First, input the stay point category GIS2, if GIS2=3, the stay point category is shopping mall, store, then the activity type is ③ shopping, and the judgment is completed. If GIS2 is not equal to 3, continue to judge whether it is another type of stay, such as Figure 7, continue to judge, if GIS2=8, the type of stay point is entertainment place, tourist place, exhibition hall, sports venue, then the activity type is ④ culture entertainment. If GIS2 is not equal to 8, then continue to judge other categories, if GIS2=7 or 12, further judge whether the activity duration dur is less than 38 minutes, if then further judge whether the start time startTime of this activity is less than 432 minutes (by zero, that is, at night 12:00 is the starting point), if the activity type is ④ culture and entertainment, if not, further judge whether the startTime is less than 1040 minutes, if the activity type is ⑥ pick-up, if not, the activity type is ⑧ other life. If the start time startTime is not less than 432 minutes, then the stay point category GIS2 is an external transportation hub such as a railway station, a long-distance bus station, a wharf, an airport, etc. At the same time, it is further judged whether the start time of the activity is less than 488 minutes, and if so, the activity type is ⑥ pick up and drop off people, otherwise ⑧ other life.

如果GIS2不是7或12,则进一步判断GIS2为是2或者5的情况,如果持续时间大于等于7分钟,则进一步判断开始时间startTime是否大于等于598分钟,如果不是,则活动类型为⑥接送人;如果是,进一步判断该活动的停留点类别GIS2是否为2如果是,该活动类型为⑤业务,如果不是则为⑧其他生活。如果开始时间startTime不是大于等于598分钟,则进一步判断开始时间startTime是否小于438分钟,如果是,则该活动类型为⑧其他生活,如果startTime不是小于438分钟,则进一步判断该活动的持续时间是否大于等于72分钟,如果不是则该活动类型为⑧其他生活,如果是进一步判断停留点类别是否为行政、商务办公场所,如果是则该活动类型为⑤业务,如果不是则为⑧其他生活。If GIS2 is not 7 or 12, then further judge that GIS2 is 2 or 5, if the duration is greater than or equal to 7 minutes, then further judge whether the start time startTime is greater than or equal to 598 minutes, if not, then the activity type is ⑥ pick-up person; If yes, further judge whether the stay point category GIS2 of the activity is 2; if yes, the activity type is ⑤ business; if not, it is ⑧ other life. If the start time startTime is not greater than or equal to 598 minutes, then further judge whether the start time startTime is less than 438 minutes, if so, then the activity type is ⑧ other life, if the startTime is not less than 438 minutes, then further judge whether the duration of the activity is greater than It is equal to 72 minutes. If not, the activity type is ⑧ other life. If it is further judged whether the type of stay is administrative or business office space, if it is, the activity type is ⑤ business. If not, it is ⑧ other life.

如果停留类别不等于2或者5,则进一步判断该活动的活动持续时间是否小于22分钟,如果不是,则该活动类型为⑧其他生活,如果是则进一步判断停留点类别是否为4或者6或者9,若是则该活动类型为⑧其他生活,如果不是则进一步判断该活动的持续时间是否小于9.5分钟,如果是,则该活动类型为⑥接送人,如果不是,进一步判断活动开始时间startTime是否小于500分钟,如果是则该活动类型为⑥接送人,否则为⑧其他生活。If the stay category is not equal to 2 or 5, further judge whether the activity duration of the activity is less than 22 minutes, if not, then the activity type is ⑧ other life, if so, further judge whether the stay category is 4 or 6 or 9 , if so, the activity type is ⑧ other life, if not, further judge whether the duration of the activity is less than 9.5 minutes, if yes, then the activity type is ⑥ pick-up person, if not, further judge whether the activity start time startTime is less than 500 Minutes, if yes, the activity type is ⑥ pick-up person, otherwise it is ⑧ other life.

根据该判断规则,输入活动的开始时间、持续时间,以及用地性质对应的停留点类别则可以判断除工作,或在家,或上学的其他活动类型。According to this judgment rule, inputting the start time and duration of the activity, and the type of stay point corresponding to the nature of the land use can determine other types of activities except work, or at home, or going to school.

本发明提供的基于智能设备数据的用户活动类型判别方法能够根据用户的智能设备数据分析出用户一天不同时间段的活动类型,从而可以为城市规划、交通管理和用户活动研究提供重要的参考依据,且分析过程中涉及到时候阈值或空间阈值均是由传统的用户出行调查数据进行标定,更具使用性,准确性和实时性。The user activity type discrimination method based on smart device data provided by the present invention can analyze the user's activity type in different time periods of a day according to the user's smart device data, thereby providing important references for urban planning, traffic management and user activity research. Moreover, the time threshold or space threshold involved in the analysis process is calibrated by traditional user travel survey data, which is more usable, accurate and real-time.

以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims (10)

1.一种基于智能设备数据的用户活动类型判别方法,其特征在于,包括以下步骤:1. A user activity type discrimination method based on smart device data, is characterized in that, comprises the following steps: 获取用户智能设备数据;Obtain user smart device data; 对所述智能设备数据进行数据清洗获得用户的移动轨迹,通过行程识别划分用户停留区段和出行区段,获得用户的出行链;Perform data cleaning on the smart device data to obtain the user's movement trajectory, and divide the user's stay section and travel section through itinerary identification to obtain the user's travel chain; 提取所述出行链中各所述停留区段的活动开始时间和活动持续时间,并根据所述停留区段的兴趣点获取所述停留区段对应的用地性质;Extracting the activity start time and activity duration of each of the stay sections in the travel chain, and obtaining the land use properties corresponding to the stay sections according to the points of interest of the stay sections; 分析用户预设时间出行的智能设备数据,获得各用户对应的停留时段,停留时长,以及停留次数,以此判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型;Analyze the smart device data of the user's travel at the preset time, and obtain the corresponding stay period, length of stay, and number of stays of each user, so as to determine the location of the user's home and/or work place, and obtain two types of activities at home or at work; 通过将除在家或在工作两种活动类型外的停留区段所对应的活动开始时间、活动持续时间,以及用地性质输入活动分类器中,以分别得到各用户除了在家或在工作两种活动类型外预定义的活动类型。By inputting the activity start time, activity duration, and land use properties corresponding to the stay segments other than the two types of activities at home or at work into the activity classifier, the two types of activities of each user except at home or at work can be obtained outside the predefined activity types. 2.根据权利要求1所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述方法还包括:构建所述活动分类器,包括:2. The method for discriminating user activity types based on smart device data according to claim 1, wherein the method further comprises: building the activity classifier, comprising: 基于传统的用户出行调查数据,提取所述调查数据各用户的停留区段的活动开始时间,活动持续时间,以及停留区段对应的用地性质构建基于决策树的活动分类器。Based on the traditional user travel survey data, the activity start time, activity duration, and land use properties corresponding to the stay section of each user in the survey data are extracted to construct an activity classifier based on a decision tree. 3.根据权利要求1所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述数据清洗包括字段缺失处理,删除异常的IMSI编号记录,删去无法与所述基站定位数据相匹配的记录,删去重复数据,乒乓效应处理,以及信号漂移处理。3. The method for discriminating user activity types based on smart device data according to claim 1, wherein the data cleaning includes field missing processing, deleting abnormal IMSI number records, and deleting records that cannot be matched with the base station positioning data. Matching records, deduplication, ping-pong effect handling, and signal drift handling. 4.根据权利要求3所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述乒乓效应处理包括以下步骤:将每位用户的所述智能设备数据按空间和时间将区域合并,若用户信号在小于空间阈值L1的范围内波动,且超过时间阈值T1,则认为用户在这段时间内处于同一位置。4. The method for discriminating user activity types based on smart device data according to claim 3, wherein the ping-pong effect processing comprises the steps of: combining the smart device data of each user with regions according to space and time , if the user signal fluctuates within a range smaller than the spatial threshold L1 and exceeds the time threshold T1, it is considered that the user is at the same location during this period. 5.根据权利要求3所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述信号漂移处理包括以下步骤:将每位用户的所述智能设备数据按空间和时间将区域合并,若用户在时间阈值T2内离开空间阈值L2,之后又返回到所述空间阈值L2内,则认为用户是处于同一个位置。5. The method for discriminating user activity types based on smart device data according to claim 3, wherein the signal drift processing comprises the following steps: combining the smart device data of each user into areas according to space and time , if the user leaves the space threshold L2 within the time threshold T2 and then returns to the space threshold L2, it is considered that the user is at the same location. 6.根据权利要求1所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述行程识别包括以下步骤:若用户在时间阈值Tstay内的轨迹点群聚于Lstay的半径范围内,或者在所述时间阈值Tstay内的移动速度低于速度阈值Vstay,则所对应的区段为停留区段,否则为出行区段。6. The method for discriminating user activity types based on smart device data according to claim 1, wherein the itinerary identification comprises the following steps: if the user's trajectory points within the time threshold T stay are clustered within the radius of L stay range, or the moving speed within the time threshold T stay is lower than the speed threshold V stay , then the corresponding section is a stay section, otherwise it is a travel section. 7.根据权利要求6所述的基于智能设备数据的用户活动类型判别方法,其特征在于,还包括以下步骤:7. The user activity type discrimination method based on smart device data according to claim 6, further comprising the steps of: 将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户数据;Convert the travel data in the user travel survey data into activity chain data, and filter the data of users who leave home in the morning and go home in the evening; 提取所有类型活动的活动持续时间,建立活动持续时间分布,将活动持续时间分布的第p%分位数作为行程识别的时间阈值Tstay,p为5至10中任意一自然数。Extract the activity duration of all types of activities, establish the activity duration distribution, and use the p% quantile of the activity duration distribution as the time threshold T stay for trip identification, where p is any natural number from 5 to 10. 8.根据权利要求1所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述根据所述停留区段的兴趣点获取所述停留区段对应的用地性质包括以下步骤:8. The method for discriminating user activity types based on smart device data according to claim 1, wherein said obtaining the land use properties corresponding to said stay section according to the points of interest of said stay section comprises the following steps: 通过时间加权的方式计算停留位置的中心坐标;Calculate the center coordinates of the stop position by time weighting; 根据所述中心坐标对应的兴趣点的位置、数量建立核密度估计模型,其公式如下:According to the position and quantity of the point of interest corresponding to the center coordinates, a kernel density estimation model is established, and its formula is as follows: <mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mfrac> <mn>1</mn> <mrow> <msup> <mi>nr</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mi>K</mi> <mrow> <mo>(</mo> <mfrac> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mi>r</mi> </mfrac> <mo>)</mo> </mrow> </mrow> <mrow><mi>f</mi><mrow><mo>(</mo><mi>s</mi><mo>)</mo></mrow><mo>=</mo><munderover><mo>&amp;Sigma;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mfrac><mn>1</mn><mrow><msup><mi>nr</mi><mn>2</mn></msup></mrow></mfrac><mi>K</mi><mrow><mo>(</mo><mfrac><msub><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>s</mi></mrow></msub><mi>r</mi></mfrac><mo>)</mo></mrow></mrow> K(.)表示核函数;K(.) represents the kernel function; r表示窗宽;r represents the window width; n表示兴趣点总数;n represents the total number of points of interest; di,s表示所述中心坐标到各兴趣点的距离;d i, s represents the distance from the center coordinates to each point of interest; 选择高斯函数作为核函数:Choose a Gaussian function as the kernel function: <mrow> <mi>K</mi> <mrow> <mo>(</mo> <mfrac> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mi>r</mi> </mfrac> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msqrt> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> </mrow> </msqrt> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <msup> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msup> <mi>r</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> <mrow><mi>K</mi><mrow><mo>(</mo><mfrac><msub><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>s</mi></mrow></msub><mi>r</mi></mfrac><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><msqrt><mrow><mn>2</mn><mi>&amp;pi;</mi></mrow></msqrt></mfrac><mi>exp</mi><mrow><mo>(</mo><mo>-</mo><mfrac><mrow><msup><msub><mi>d</mi><mrow><mi>i</mi><mo>,</mo><mi>s</mi></mrow></msub><mn>2</mn></msup></mrow><mrow><mn>2</mn><msup><mi>r</mi><mn>2</mn></msup></mrow></mfrac><mo>)</mo></mrow></mrow> 计算所述停留位置处不同兴趣点类型的核密度值,取核密度值最高的兴趣点对应的用地性质作为停留位置的用地性质。Calculate the kernel density values of different interest point types at the stay position, and take the land use property corresponding to the interest point with the highest kernel density value as the land use property of the stay position. 9.根据权利要求1所述的基于智能设备数据的用户活动类型判别方法,其特征在于,所述分析用户预设时间出行的智能设备数据,获得对应的停留时段,停留时长,以及停留次数,以此判断用户家和/或工作地的位置,以获得在家或在工作两种活动类型,包括以下步骤:9. The method for discriminating user activity types based on smart device data according to claim 1, wherein the smart device data of the user's preset travel time is analyzed to obtain the corresponding period of stay, duration of stay, and number of stays, Based on this, the location of the user's home and/or work place is determined to obtain two types of activities at home or at work, including the following steps: 提取每个用户的所有所述活动开始时间,所述活动持续时间,以及用地性质;extracting all of said activity start times, said activity durations, and land use properties for each user; 筛选出每个用户工作日的数据,统计得总天数为NFilter out the data of each user's working day, and count the total number of days as N 对于每一类停留位置,统计夜间停留时间大于Thome的总天数NhomeFor each type of stay position, count the total number of days N home where the night stay time is greater than T home ; 若Nhome大于第一判断时间阈值,则该位置为家庭所在地,否则,统计在工作时段停留时间大于Twork的总天数NworkIf N home is greater than the first judgment time threshold, then this position is the home location, otherwise, count the total number of days N work where the stay time during working hours is greater than T work ; 若Nwork大于第二判断时间阈值,则该位置为工作地。If N work is greater than the second judging time threshold, the location is a working location. 10.根据权利要求9所述的基于智能设备数据的用户活动类型判别方法,其特征在于,还包括以下步骤:10. The user activity type discrimination method based on smart device data according to claim 9, further comprising the steps of: 将用户出行调查数据中的出行数据转化为活动链数据,筛选早上从家出发且晚上回家的用户智能设备数据;Convert the travel data in the user travel survey data into activity chain data, and filter the smart device data of users who leave home in the morning and return home in the evening; 提取夜间停留地点为家的活动,建立活动持续时间分布,将活动持续时间分布的第a%分位数家庭所在地识别的时间阈值Thome,a为5至10中任意一自然数;Extract the activities where the place of stay at night is home, establish the distribution of activity duration, and identify the time threshold T home of the a% quantile family location of the distribution of activity duration, where a is any natural number from 5 to 10; 提取工作时段停留地点为工作地的活动,建立活动持续时间分布,将活动持续时间分布的第b%分位数作为工作地识别的时间阈值Twork,b为5至10中任意一自然数。Extract the activities where the place of stay during the working period is the work place, establish the activity duration distribution, and use the b% quantile of the activity duration distribution as the time threshold T work for work place identification, where b is any natural number from 5 to 10.
CN201610443684.1A 2016-06-20 2016-06-20 User Activity type identification method based on smart machine data Pending CN107529135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610443684.1A CN107529135A (en) 2016-06-20 2016-06-20 User Activity type identification method based on smart machine data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610443684.1A CN107529135A (en) 2016-06-20 2016-06-20 User Activity type identification method based on smart machine data

Publications (1)

Publication Number Publication Date
CN107529135A true CN107529135A (en) 2017-12-29

Family

ID=60733855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610443684.1A Pending CN107529135A (en) 2016-06-20 2016-06-20 User Activity type identification method based on smart machine data

Country Status (1)

Country Link
CN (1) CN107529135A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429856A (en) * 2018-02-28 2018-08-21 维沃移动通信有限公司 A positioning information acquisition method and mobile terminal
CN108668238A (en) * 2018-08-16 2018-10-16 天狼联盟材料科技研究(广东)有限公司 A kind of shoes and its recording method based on the life of APP records and movement locus
CN109493119A (en) * 2018-10-19 2019-03-19 南京图申图信息科技有限公司 A kind of city commercial center identification method and system based on POI data
CN109788428A (en) * 2018-12-28 2019-05-21 科大国创软件股份有限公司 A kind of user's classifying identification method based on carrier data
CN109918582A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user's list point of interest knowledge method for distinguishing based on space-time data
CN110572776A (en) * 2019-09-20 2019-12-13 奇酷互联网络科技(深圳)有限公司 method for dividing safety area, terminal and storage medium
CN111367896A (en) * 2018-12-25 2020-07-03 北京融信数联科技有限公司 User personalized activity map construction method based on big data
CN112866920A (en) * 2021-01-07 2021-05-28 东南大学 Method for identifying employment place by processing mobile phone signaling data through kernel function
CN113268679A (en) * 2021-04-19 2021-08-17 宁波市测绘和遥感技术研究院 Visual processing method based on internet big data
CN114419749A (en) * 2021-12-20 2022-04-29 优得新能源科技(宁波)有限公司 Photovoltaic power plant fortune dimension personnel work quantization system based on location coordinate
CN119357721A (en) * 2024-12-24 2025-01-24 浙江大学 A clustering method for individual multi-day activity patterns based on mobile phone signaling data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682041A (en) * 2011-03-18 2012-09-19 日电(中国)有限公司 User behavior identification equipment and method
CN103460722A (en) * 2011-03-31 2013-12-18 高通股份有限公司 Methods, devices, and apparatuses for activity classification using temporal scaling of time-referenced features
CN104159189A (en) * 2013-05-15 2014-11-19 同济大学 Resident trip information obtaining method based on intelligent mobile phone
CN104680046A (en) * 2013-11-29 2015-06-03 华为技术有限公司 User activity recognition method and device
US20160133295A1 (en) * 2014-11-07 2016-05-12 H4 Engineering, Inc. Editing systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682041A (en) * 2011-03-18 2012-09-19 日电(中国)有限公司 User behavior identification equipment and method
CN103460722A (en) * 2011-03-31 2013-12-18 高通股份有限公司 Methods, devices, and apparatuses for activity classification using temporal scaling of time-referenced features
CN104159189A (en) * 2013-05-15 2014-11-19 同济大学 Resident trip information obtaining method based on intelligent mobile phone
CN104680046A (en) * 2013-11-29 2015-06-03 华为技术有限公司 User activity recognition method and device
US20160133295A1 (en) * 2014-11-07 2016-05-12 H4 Engineering, Inc. Editing systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨超; 朱荣荣; 涂然: "基于智能手机调查数据的居民出行活动特征分析", 《交通信息与安全》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429856A (en) * 2018-02-28 2018-08-21 维沃移动通信有限公司 A positioning information acquisition method and mobile terminal
CN108668238A (en) * 2018-08-16 2018-10-16 天狼联盟材料科技研究(广东)有限公司 A kind of shoes and its recording method based on the life of APP records and movement locus
CN109493119B (en) * 2018-10-19 2020-06-23 南京图申图信息科技有限公司 POI data-based urban business center identification method and system
CN109493119A (en) * 2018-10-19 2019-03-19 南京图申图信息科技有限公司 A kind of city commercial center identification method and system based on POI data
CN111367896A (en) * 2018-12-25 2020-07-03 北京融信数联科技有限公司 User personalized activity map construction method based on big data
CN109788428A (en) * 2018-12-28 2019-05-21 科大国创软件股份有限公司 A kind of user's classifying identification method based on carrier data
CN109788428B (en) * 2018-12-28 2020-12-18 科大国创软件股份有限公司 User classification identification method based on operator data
CN109918582A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of user's list point of interest knowledge method for distinguishing based on space-time data
CN110572776A (en) * 2019-09-20 2019-12-13 奇酷互联网络科技(深圳)有限公司 method for dividing safety area, terminal and storage medium
CN112866920A (en) * 2021-01-07 2021-05-28 东南大学 Method for identifying employment place by processing mobile phone signaling data through kernel function
CN113268679A (en) * 2021-04-19 2021-08-17 宁波市测绘和遥感技术研究院 Visual processing method based on internet big data
CN114419749A (en) * 2021-12-20 2022-04-29 优得新能源科技(宁波)有限公司 Photovoltaic power plant fortune dimension personnel work quantization system based on location coordinate
CN119357721A (en) * 2024-12-24 2025-01-24 浙江大学 A clustering method for individual multi-day activity patterns based on mobile phone signaling data

Similar Documents

Publication Publication Date Title
CN107529135A (en) User Activity type identification method based on smart machine data
Alexander et al. Origin–destination trips by purpose and time of day inferred from mobile phone data
Xu et al. Another tale of two cities: Understanding human activity space using actively tracked cellphone location data
Huang et al. Transport mode detection based on mobile phone network data: A systematic review
CN107241512B (en) Method and device for judging intercity traffic travel mode based on mobile phone data
CN107305590B (en) A method for determining urban traffic travel characteristics based on mobile phone signaling data
Widhalm et al. Discovering urban activity patterns in cell phone data
Gundlegård et al. Travel demand estimation and network assignment based on cellular network data
CN102595323B (en) Method for obtaining resident travel characteristic parameter based on mobile phone positioning data
WO2017133627A1 (en) User commuter track management method, device and system
Qian et al. Characterizing urban dynamics using large scale taxicab data
CN105701123B (en) The recognition methods of man-vehicle interface and device
CN109583640A (en) A kind of Urban Traffic passenger flow attribute recognition approach based on multi-source location data
CN104902438B (en) A kind of statistical method and its system based on mobile communication terminal analysis passenger flow characteristic information
CN106600960A (en) Traffic travel origin and destination identification method based on space-time clustering analysis algorithm
CN106912015A (en) A kind of personnel&#39;s Trip chain recognition methods based on mobile network data
CN105142106A (en) Traveler home-work location identification and trip chain depicting method based on mobile phone signaling data
CN107133318A (en) A kind of population recognition methods based on mobile phone signaling data
CN105682025A (en) User residing location identification method based on mobile signaling data
CN109684373B (en) Key relation person discovery method based on travel and call ticket data analysis
CN106790468A (en) A kind of distributed implementation method for analyzing user&#39;s WiFi event trace rules
CN106931974A (en) The method that personal Commuting Distance is calculated based on mobile terminal GPS location data record
CN110990443A (en) Mobile phone signaling-based professional and living population characteristic estimation method
Fekih et al. Potential of cellular signaling data for time-of-day estimation and spatial classification of travel demand: a large-scale comparative study with travel survey and land use data
CN111651529A (en) Classification and identification method of airport air passengers based on mobile phone signaling data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171229

RJ01 Rejection of invention patent application after publication