CN106682051A

CN106682051A - Method for finding out crowd movement behaviors

Info

Publication number: CN106682051A
Application number: CN201510982408.8A
Authority: CN
Inventors: 王恩慈; 吴泰廷; 高崎钧; 王昭智; 郭奕宏
Original assignee: Industrial Technology Research Institute ITRI
Current assignee: Industrial Technology Research Institute ITRI
Priority date: 2015-11-09
Filing date: 2015-12-24
Publication date: 2017-05-17
Anticipated expiration: 2035-12-24
Also published as: CN106682051B

Abstract

The invention discloses a method for finding out crowd movement behaviors, which comprises the following steps: collecting a plurality of location data regarding a plurality of user devices; detecting a plurality of conventional patterns in the position data to generate a plurality of representative sequences, wherein each representative sequence comprises at least one line segment between a starting position point and an ending position point; and classifying the representative sequences into a plurality of sets according to a plurality of sequence distances among the representative sequences so as to find the moving behaviors of the crowd.

Description

A way to find out the movement behavior of crowds

技术领域technical field

本发明是有关于一种藉由收集关于用户装置的位置数据，以找出人群移动行为的方法。The present invention relates to a method for finding out the movement behavior of crowds by collecting location data about user devices.

背景技术Background technique

对于许多公司与组织，例如连锁便利商店、大众运输公司、地方政府等等，知道人群于城市中或是在多个城市间如何移动可能是很重要的信息。对于这些组织而言，有许多重大的决策需仰赖关于人群在哪里以及人群从何处来又移动到何处的信息，这些决策例如是设定新的公交车路线、建造新的交通转运站、开设新的店面、以及建造城市公共设施。因此，如何有效地找出人群移动行为的相关信息，乃目前业界所致力的课题之一。For many companies and organizations, such as convenience store chains, mass transit companies, local governments, etc., knowing how people move within a city or between multiple cities can be important information. For these organizations, there are many important decisions that depend on information about where people are and where they come from and move to, such as setting new bus routes, building new transit stations, Open new storefronts, and build urban public facilities. Therefore, how to effectively find out the relevant information of the crowd's mobile behavior is one of the topics that the industry is currently working on.

发明内容Contents of the invention

本发明是有关于一种找出人群移动行为的方法及执行此方法的非瞬时计算机可读取媒体。The present invention relates to a method for finding crowd movement behavior and a non-transitory computer-readable medium for executing the method.

根据本发明之一实施例，提出一种找出人群移动行为的方法，方法包括：收集关于多个用户装置的多个位置数据；检测这些位置数据中的多个惯常模式以产生多个代表性序列，其中各代表性序列包括至少一线段，该至少一线段在起始位置点与结束位置点之间；以及根据这些代表性序列之间的多个序列距离，将这些代表性序列分类为多个集合，以找出人群移动行为。According to one embodiment of the present invention, a method for finding out the movement behavior of a crowd is proposed, the method includes: collecting a plurality of location data about a plurality of user devices; detecting a plurality of habitual patterns in the location data to generate a plurality of representative Sequences, wherein each representative sequence includes at least one segment, the at least one segment is between the start position point and the end position point; and according to a plurality of sequence distances between the representative sequences, the representative sequences are classified into multiple A collection to find out the crowd movement behavior.

为了对本发明之上述及其他方面有更佳的了解，下文特举依据本发明的可实施范例，并配合所附图式，作详细说明如下：In order to have a better understanding of the above and other aspects of the present invention, the implementation examples according to the present invention are specifically cited below, together with the accompanying drawings, for detailed description as follows:

附图说明Description of drawings

图1绘示以智能卡进行支付活动的一范例示意图。FIG. 1 is a schematic diagram of an example of payment activities performed by a smart card.

图2绘示依据本发明一实施例之找出人群移动行为方法的流程图。FIG. 2 is a flow chart of a method for finding crowd movement behavior according to an embodiment of the present invention.

图3绘示一范例支付记录的示意图，支付记录相关于从多个用户装置撷取的位置数据以及时间。FIG. 3 illustrates a schematic diagram of an example payment record related to location data and time captured from multiple user devices.

图4绘示依据本发明一实施例之收集关于用户装置的位置数据的流程图。FIG. 4 illustrates a flow chart of collecting location data about a user device according to an embodiment of the invention.

图5A以及图5B绘示依据本发明一实施例之将支付位置以最接近的参考位置点标注的范例示意图。FIG. 5A and FIG. 5B are schematic diagrams illustrating an example of marking the payment location with the closest reference location point according to an embodiment of the present invention.

图6绘示依据本发明一实施例之整理后与简化后的支付记录的示意图。FIG. 6 is a schematic diagram of sorted and simplified payment records according to an embodiment of the present invention.

图7绘示依据本发明一实施例之检测位置数据中的惯常模式以产生代表性序列的流程图。FIG. 7 illustrates a flow chart of detecting common patterns in location data to generate a representative sequence according to an embodiment of the present invention.

图8绘示依据本发明一实施例之聚合惯常模式的示意图。FIG. 8 is a schematic diagram illustrating a conventional mode of aggregation according to an embodiment of the present invention.

图9绘示依据本发明一实施例之计算代表性序列之间的序列距离的流程图。FIG. 9 is a flow chart of calculating the sequence distance between representative sequences according to an embodiment of the present invention.

图10绘示依据本发明一实施例之计算第一线段与第二线段之间的线段距离的流程图。FIG. 10 is a flow chart of calculating a line segment distance between a first line segment and a second line segment according to an embodiment of the present invention.

图11绘示依据本发明一实施例之两线段之间线段距离的示意图。FIG. 11 is a schematic diagram illustrating the line segment distance between two line segments according to an embodiment of the present invention.

图12绘示依据本发明一实施例之计算第一线段与第二线段之间平行距离的流程图。FIG. 12 is a flow chart of calculating the parallel distance between the first line segment and the second line segment according to an embodiment of the present invention.

图13绘示依据本发明一实施例之两线段之间线段距离的示意图。FIG. 13 is a schematic diagram illustrating the line segment distance between two line segments according to an embodiment of the present invention.

图14绘示依据本发明一实施例之计算正规化角距离、正规化垂直距离、及正规化平行距离的流程图。FIG. 14 shows a flow chart of calculating normalized angular distance, normalized vertical distance, and normalized parallel distance according to an embodiment of the present invention.

图15A～图15D绘示依据本发明一实施例之考虑垂直距离域最大值的多种情形示意图。15A-15D are schematic diagrams illustrating various situations considering the maximum value of the vertical distance field according to an embodiment of the present invention.

图16绘示依据本发明一实施例之决定第一序列与第二序列之间序列距离的流程图。FIG. 16 is a flow chart of determining the sequence distance between the first sequence and the second sequence according to an embodiment of the present invention.

图17A～图17C绘示依据本发明一实施例之两个代表性序列之间多个映像组合的示意图。17A-17C are schematic diagrams illustrating combinations of multiple images between two representative sequences according to an embodiment of the present invention.

图18绘示两个代表性序列之间的一种无效映像的范例示意图。FIG. 18 is a schematic diagram illustrating an example of an invalid mapping between two representative sequences.

图19绘示依据本发明一实施例之计算映像组合的映像距离的流程图。FIG. 19 is a flow chart of calculating the image distance of an image combination according to an embodiment of the present invention.

图20绘示依据本发明一实施例之将代表性序列分类为集合的流程图。FIG. 20 illustrates a flowchart for classifying representative sequences into sets according to an embodiment of the present invention.

图21绘示依据本发明一实施例之找出人群移动行为以及找出典型序列的流程图。FIG. 21 is a flow chart of finding out crowd movement behavior and finding out a typical sequence according to an embodiment of the present invention.

图22绘示依据本发明一实施例之找出目标日期类型的典型序列的流程图。FIG. 22 is a flowchart illustrating a typical sequence for finding a target date type according to an embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明作进一步的详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

现代生活中许多人使用智能卡搭乘大众运输，例如公交车、火车、捷运等等，此外，智能卡也可作为电子货币包以购买物品或支付费用，举例而言，智能卡可预存资金在卡内，可用于贩卖机、进出停车场、或进出火车站。图1绘示以智能卡进行支付活动的一范例示意图，于此范例中的智能卡是一种非接触式的智能卡。每当使用智能卡时，会产生一笔支付记录(Payment Log)或是交通记录，而由于贩卖机或是车站闸门可具有静态的地理信息，因此可以收集到关于多个使用者智能卡于使用时的位置数据。发行智能卡的服务提供商可以藉由收集这些支付记录，以获得人群在哪里以及人群如何移动的相关信息。In modern life, many people use smart cards to take public transportation, such as buses, trains, subways, etc. In addition, smart cards can also be used as electronic currency wallets to purchase items or pay expenses. For example, smart cards can pre-store funds in the card, Can be used for vending machines, entering and exiting parking lots, or entering and exiting train stations. FIG. 1 is a schematic diagram of an example of payment activities using a smart card. In this example, the smart card is a contactless smart card. Whenever a smart card is used, a payment record (Payment Log) or traffic record will be generated, and since vending machines or station gates can have static geographical information, it is possible to collect data about multiple users' smart cards when they are used. location data. Service providers who issue smart cards can collect these payment records to gain information about where people are and how people are moving.

图2绘示依据本发明一实施例之找出人群移动行为方法的流程图，找出人群移动行为的方法包括下列步骤。步骤S100，收集关于多个用户装置的位置数据。步骤S200，检测(Mining)位置数据中的惯常模式(Frequent Patterns)以产生多个代表性序列(Representative Sequences)。步骤S300，根据代表性序列之间的序列距离，将代表性序列分类为集合(Cluster)，以找出人群移动行为。其中各代表性序列包括至少一线段，该至少一线段在起始位置点与结束位置点之间。此方法例如可由软件程序实作，软件程序可储存于光盘上，软件程序可以包括多个相关于计算机处理器的指令，这些指令可被计算机处理器加载以执行上述的找出人群移动行为的方法。关于各步骤的详细说明如下。FIG. 2 is a flow chart of a method for finding crowd movement behavior according to an embodiment of the present invention. The method for finding crowd movement behavior includes the following steps. Step S100, collecting location data about a plurality of user devices. Step S200 , detecting (Mining) frequent patterns in the location data to generate a plurality of representative sequences (Representative Sequences). Step S300, according to the sequence distance between the representative sequences, classify the representative sequences into clusters (Cluster), so as to find out the movement behavior of the crowd. Wherein each representative sequence includes at least one segment, and the at least one segment is between the start position point and the end position point. This method can be implemented by a software program, for example, the software program can be stored on a CD, and the software program can include a plurality of instructions related to the computer processor, and these instructions can be loaded by the computer processor to perform the above-mentioned method for finding out the movement behavior of the crowd . A detailed description of each step follows.

步骤S100：收集关于多个用户装置的位置数据。用户装置可包括智能卡、电子付费卡、或是具有支付能力的行动装置，当这些用户装置用于支付活动时，可收集相关的位置数据。举例而言，当智能卡用于在一个支付终端机进行支付活动时，可上传一个报告至中央服务器，这个报告可包括智能卡的身份认证(Identification，ID)、支付金额、日期时间、支付终端机的位置。本发明的找出人群移动行为方法并不限于在进行支付活动时收集位置数据，可收集位置数据的时机亦可包括：当用户装置进入车站时、当钱被存入用户装置时、或者当用户装置经过认证而进入建筑物时。为了便于理解，以下说明将以在进行支付活动时收集位置数据作为范例说明，并以支付记录代表收集到的位置数据。Step S100: Collect location data about a plurality of user devices. The user device may include a smart card, an electronic payment card, or a mobile device with payment capabilities. When these user devices are used for payment activities, relevant location data may be collected. For example, when a smart card is used to carry out payment activities at a payment terminal, a report can be uploaded to the central server, and this report can include the smart card's identity authentication (Identification, ID), payment amount, date and time, payment terminal Location. The method for finding out crowd movement behavior of the present invention is not limited to collecting location data when payment activities are performed, and the timing of collecting location data may also include: when the user device enters a station, when money is deposited into the user device, or when the user When a device is authenticated to enter a building. For ease of understanding, the following description will take location data collected during payment activities as an example, and payment records will represent the collected location data.

图3绘示一范例支付记录的示意图，支付记录相关于从多个用户装置撷取的位置数据以及时间。支付记录可储存于支付服务提供商的中央服务器。于此范例中，支付记录包括的字段有：uid、离开位置、到达位置、日期时间、支付金额、交易类型。uid即代表用户装置的ID，亦即，相同的uid对应到相同的用户装置，即可能对应到相同的使用者，因此可以获得一个人身处何处的信息，藉由收集关于多个用户装置的位置数据，服务提供商可以得知大多数的人有怎样的共同移动轨迹。交易类型可以是购买、交通、存钱、或其他类型，对于交通类型，支付活动可能是在目的地车站进行，而离开位置与到达位置分别记录相关的交通信息，例如是离开车站与到达车站的地理坐标。对于交通以外的其他交易类型，可使用离开位置字段记录支付活动的位置，而到达位置字段则可以使用「-」记录。FIG. 3 illustrates a schematic diagram of an example payment record related to location data and time captured from multiple user devices. Payment records can be stored in the central server of the payment service provider. In this example, the payment record includes fields: uid, departure location, arrival location, date and time, payment amount, and transaction type. uid represents the ID of the user device, that is, the same uid corresponds to the same user device, that is, it may correspond to the same user, so it is possible to obtain information about where a person is, by collecting information about multiple user devices With location data, service providers can learn what common movement trajectories most people have. The transaction type can be purchase, transportation, money saving, or other types. For the transportation type, the payment activity may be performed at the destination station, and the departure location and arrival location record relevant traffic information, such as the departure location and arrival location. geographic coordinates. For transaction types other than transportation, the location of the payment activity can be recorded using the departure location field, and "-" can be used for the arrival location field.

在上述的例子中，所记录的坐标可以是精确的位置数据。而支付记录内，可能由于有大量的商店使用支付服务，因此不同位置的精确位置坐标数量可能会很多。对于找出有兴趣的人群移动行为而言，可能不需要精确的位置，因此可将邻近的位置视为一个语义区域(Semantic Region)，并可以选择一个参考位置点(Reference Location Point)以代表一个语义区域。步骤S100可以包括步骤S110及步骤S120，如图4所绘示依据本发明一实施例之收集关于用户装置的位置数据的流程图。In the above example, the recorded coordinates may be precise location data. In the payment record, there may be a large number of precise location coordinates of different locations due to the fact that a large number of stores use the payment service. For finding out the moving behavior of people of interest, precise location may not be needed, so the adjacent location can be regarded as a semantic region (Semantic Region), and a reference location point (Reference Location Point) can be selected to represent a Semantic region. Step S100 may include step S110 and step S120 , as shown in FIG. 4 , a flow chart of collecting location data about a user device according to an embodiment of the present invention.

步骤S110：选择多个参考位置点。参考位置点的范例可以包括学校、连锁便利商店、根据居住人口决定的景观位置。步骤S120：将位置数据中的各个位置点，取代为与其地理位置最接近的参考位置点。对于支付记录中的每一笔支付数据，原本的精确位置点可以取代为最接近的一个参考位置点。图5A以及图5B绘示依据本发明一实施例之将支付位置以最接近的参考位置点标注的范例示意图。图5A以三个不同阴影填满的三角形表示三个预先选择的参考位置点Ref_a、Ref_b、Ref_c，以空心圆形原本的支付位置。接着，将每个支付位置取代为地理位置最接近的参考位置点，如图5B所示，每个支付位置的阴影填满状态改变为与其对应的参考位置点相同。在使用最接近的参考位置点标注支付位置之后，在原本支付记录内的地理坐标即可以取代为这些参考位置点。此处所提及的步骤S110及步骤S120为可选步骤，亦即，即使不执行步骤S110及步骤S120，支付记录中保存有原始的支付位置，后续的步骤S200以及步骤S300依然可以对原始的支付位置执行。Step S110: Select a plurality of reference position points. Examples of reference location points may include schools, chain convenience stores, and landscape locations determined by residential population. Step S120: Replace each location point in the location data with the closest reference location point to its geographic location. For each piece of payment data in the payment record, the original precise location point can be replaced by the closest reference location point. FIG. 5A and FIG. 5B are schematic diagrams illustrating an example of marking the payment location with the closest reference location point according to an embodiment of the present invention. Fig. 5A shows three pre-selected reference positions Ref_a, Ref_b, Ref_c with three triangles filled with different shades, and the original payment position with a hollow circle. Next, each payment location is replaced with the reference location point closest to the geographic location. As shown in FIG. 5B , the shadow filling state of each payment location changes to be the same as its corresponding reference location point. After marking the payment location with the closest reference location points, the geographic coordinates in the original payment record can be replaced by these reference location points. Step S110 and step S120 mentioned here are optional steps, that is, even if step S110 and step S120 are not executed, the original payment location is saved in the payment record, and the subsequent step S200 and step S300 can still be used for the original payment location. Pay location execution.

步骤S200：检测位置数据中的惯常模式以产生多个代表性序列。在数据收集以及上述的前置处理阶段之后，支付记录可以转为支付序列(Sequence)，每一个支付序列包括一序列的项目(Item)，代表着特定使用者在特定时间的整个支付轨迹，支付序列中的项目可以是如图5A及图5B所示的参考位置点，一个范例的支付序列可以是{id_677：Ref_h，Ref_c，Ref_c}。从多个支付序列中，可以使用循序样式检测(Sequential Pattern Mining)算法，例如是PrefixSpan或Generalized Sequential Pattern(GSP)算法，以找出支付序列中的惯常模式。在循序样式检测之后，可以得到对于特定时间中的代表性序列以及其对应的支持度(Support Count)，支持度可代表出现的次数，可于检测算法中计算得到。各个代表性序列包括至少一线段，该至少一线段在一起始位置点与一结束位置点之间，起始位置点及结束位置点可以是精确位置或是参考位置点。举例而言，支付记录中对应到交通类型的一笔支付资料，可以视为在离开位置以及到达位置之间的一线段。代表性序列的一个范例为<Ref_a，Ref_d，Ref_e>，此代表性序列包括两个线段，一个线段从Ref_a到Ref_d，另一个线段从Ref_d到Ref_e。Step S200: Detect the usual patterns in the location data to generate a plurality of representative sequences. After data collection and the above-mentioned pre-processing stage, payment records can be converted into payment sequences (Sequence), each payment sequence includes a sequence of items (Item), representing the entire payment track of a specific user at a specific time, payment Items in the sequence may be reference points as shown in FIG. 5A and FIG. 5B , and an example payment sequence may be {id_677: Ref_h, Ref_c, Ref_c}. From multiple payment sequences, a sequential pattern detection (Sequential Pattern Mining) algorithm, such as PrefixSpan or Generalized Sequential Pattern (GSP) algorithm, can be used to find out the usual pattern in the payment sequence. After the sequential pattern detection, the representative sequence in a specific time and its corresponding support (Support Count) can be obtained. The support can represent the number of occurrences and can be calculated in the detection algorithm. Each representative sequence includes at least one segment between a start position point and an end position point. The start position point and the end position point may be precise positions or reference position points. For example, a piece of payment information corresponding to the traffic type in the payment record can be regarded as a segment between the departure location and the arrival location. An example of a representative sequence is <Ref_a, Ref_d, Ref_e>, which includes two line segments, one from Ref_a to Ref_d and the other from Ref_d to Ref_e.

在步骤S200当中，支付记录可以先根据uid以及日期时间进行排序，例如可以将对应到相同用户装置的交易群聚在一起，并且将对应此用户装置的事务数据依据时间顺序进行排序。此外，各个位置点可以使用最接近的参考位置点标注，如图5B所示。图6绘示依据本发明一实施例之整理后与简化后的支付记录的示意图。在此例中，对于uid 604可以形成一个交易序列：<Ref_a，Ref_d，Ref_e>，而对于uid 677可以形成另一个交易序列：<Ref_h，Ref_c，Ref_c>，接着便可以对多个交易序列应用循序样式检测算法以找出惯常模式。In step S200, the payment records can be sorted according to uid and date and time, for example, the transactions corresponding to the same user device can be grouped together, and the transaction data corresponding to the user device can be sorted according to time order. In addition, each location point can be marked with the closest reference location point, as shown in FIG. 5B . FIG. 6 is a schematic diagram of sorted and simplified payment records according to an embodiment of the present invention. In this example, one transaction sequence can be formed for uid 604: <Ref_a, Ref_d, Ref_e>, and another transaction sequence can be formed for uid 677: <Ref_h, Ref_c, Ref_c>, and then can be applied to multiple transaction sequences A sequential pattern detection algorithm to find common patterns.

图7绘示依据本发明一实施例之检测位置数据中的惯常模式以产生代表性序列的流程图。步骤S200可包括步骤S210、步骤S220、及步骤S230，这些步骤在样式检测之后执行。步骤S210，若是有一个惯常模式仅具有单一位置点，则移除此惯常模式。步骤S220，在每个惯常模式中，移除相同的相邻位置点。由于是找出人群移动行为的方法，因此会排除那些代表停留在原地的惯常模式，例如是<Ref_a>以及<Ref_a，Ref_a>。此外，包括有至少两个相同的相邻位置点的惯常模式，例如<Ref_h，Ref_c，Ref_c>，同样也会排除重复的相邻位置点。如此得到的代表性序列不会包括有相同的相邻位置点，每个代表性序列当中的每一个线段皆代表人群移动行为的方向。FIG. 7 illustrates a flow chart of detecting common patterns in location data to generate a representative sequence according to an embodiment of the present invention. Step S200 may include step S210, step S220, and step S230, and these steps are performed after pattern detection. Step S210, if there is a habitual pattern with only a single point, remove the habitual pattern. Step S220, in each usual mode, remove the same adjacent position points. Since it is a method to find out the moving behavior of the crowd, it will exclude those usual patterns that represent staying in place, such as <Ref_a> and <Ref_a, Ref_a>. In addition, common patterns that include at least two identical adjacent locations, such as <Ref_h, Ref_c, Ref_c>, will also exclude duplicate adjacent locations. The representative sequences obtained in this way will not include the same adjacent position points, and each line segment in each representative sequence represents the direction of the crowd's moving behavior.

步骤S230，将数日的惯常模式聚合以产生代表性序列。收集到的支付记录可以依照不同时间以及不同日期进行分类，这是由于在不同的时间区段中，人群移动行为轨迹可能不相同。举例而言，6：30am～9：30am、10：30am～13：30am、4：30pm～7：30pm这三个时段可能分别对应到不同的人群活动。若是需要多日累积下来针对特定时间区段的统计分析，则可以将这个时间区段对应的多日的惯常模式进行聚合。图8绘示依据本发明一实施例之聚合惯常模式的示意图。在此范例中，五月当中所有工作日的时段6：30am～9：30am当中的代表性序列被聚合，表格中的数字代表代表性序列的支持度。如图8所示，每一个工作天的序列<Ref_d，Ref_e>经累加后，所产生整个五月这个序列的支持度为5750。数字23/23是出现率(Occurrence Rate)，代表这个序列<Ref_d，Ref_e>在五月的23个工作天中出现23次。藉由聚合多日的统计数据，更可以准确地找出大多数人的移动趋势。Step S230, aggregate the habitual patterns of several days to generate a representative sequence. The collected payment records can be classified according to different time and different dates, because in different time segments, the trajectory of crowd movement behavior may be different. For example, the three time periods of 6:30am to 9:30am, 10:30am to 13:30am, and 4:30pm to 7:30pm may respectively correspond to different crowd activities. If it is necessary to accumulate statistical analysis for a specific time period over multiple days, the usual pattern of multiple days corresponding to this time period can be aggregated. FIG. 8 is a schematic diagram illustrating a conventional mode of aggregation according to an embodiment of the present invention. In this example, the representative sequences from 6:30am to 9:30am on all working days in May are aggregated, and the numbers in the table represent the support of the representative sequences. As shown in Figure 8, after the accumulation of the sequence <Ref_d, Ref_e> of each working day, the support degree of the generated sequence in May is 5750. The number 23/23 is the Occurrence Rate, which means that the sequence <Ref_d, Ref_e> appeared 23 times in 23 working days in May. By aggregating multi-day statistical data, it is possible to accurately find out the movement trend of most people.

步骤S300，根据代表性序列之间的序列距离，将代表性序列分类为集合，以找出人群移动行为。在执行步骤S100及步骤S200之后，关于人群移动行为的代表性序列已被找出，接着可将相似的代表性序列分类为集合，以找出在较大范围区域之间的人群移动行为。如此作法是有益处的，因为大多数有兴趣的人群移动行为是关于在两个大范围区域之间的移动，而非两个特定建筑物之间的移动。在步骤S300可计算两个代表性序列之间的序列距离，以决定这两个代表性序列的相似程度。举例而言，较短的序列距离代表这两个代表性序列有较高的相似程度(例如于地理位置上较为靠近)。可以根据如此计算得到的序列距离，将代表性序列分类为集合。Step S300, according to the sequence distance between the representative sequences, classify the representative sequences into sets, so as to find out the movement behavior of the crowd. After step S100 and step S200 are performed, the representative sequence of the crowd movement behavior has been found, and then similar representative sequences can be classified into sets to find the crowd movement behavior between large areas. Doing so is beneficial because most movement behavior of people of interest is about moving between two large areas rather than between two specific buildings. In step S300, the sequence distance between two representative sequences can be calculated to determine the degree of similarity between the two representative sequences. For example, a shorter sequence distance indicates that the two representative sequences have a higher degree of similarity (eg, closer geographically). Representative sequences can be classified into sets based on the sequence distances thus calculated.

以下说明一个关于计算序列距离的范例。一个代表性序列可视为一序列的线段，在此说明范例中，代表性序列包括有第一序列Seq_a以及第二序列Seq_b。第一序列Seq_a包括第一线段L1，第一线段L1在第一起始位置点L1_s与第一结束位置点L1_e之间。第二序列Seq_b包括第二线段L2，第二线段L2在第二起始位置点L2_s与第二结束位置点L2_e之间。第一序列Seq_a与第二序列Seq_b之间的序列距离，是根据第一线段L1与第二线段L2之间的线段距离而决定。第一线段L1具有方向性(从第一起始点L1_s指向第一结束点L1_e)，第二线段L2亦同样具有方向性(从第二起始点L2_s指向第二结束点L1_e)。因此于以下的说明中，将使用向量以及分别代表第一线段L1以及第二线段L2。The following illustrates an example of calculating sequence distance. A representative sequence can be regarded as a sequence of line segments. In this illustrative example, the representative sequence includes a first sequence Seq_a and a second sequence Seq_b. The first sequence Seq_a includes a first line segment L1 between the first start location point L1_s and the first end location point L1_e. The second sequence Seq_b includes a second line segment L2 between the second start location point L2_s and the second end location point L2_e. The sequence distance between the first sequence Seq_a and the second sequence Seq_b is determined according to the line segment distance between the first line segment L1 and the second line segment L2. The first line segment L1 has directionality (from the first starting point L1_s to the first end point L1_e), and the second line segment L2 also has directionality (from the second starting point L2_s to the second end point L1_e). Therefore, in the following description, the vector as well as represent the first line segment L1 and the second line segment L2 respectively.

图9绘示依据本发明一实施例之计算代表性序列之间的序列距离的流程图。第一序列Seq_a及第二序列Seq_b形成代表性序列中的一个序列对，对于每一个序列对，步骤S300(将代表性序列分类为集合)更可包括步骤S310以及步骤S320。步骤S310，计算第一线段L1与第二线段L2之间的线段距离，线段距离代表这两个线段的靠近程度或相似程度。步骤S320，根据第一线段L1与第二线段L2之间的线段距离，决定第一序列Seq_a与第二序列Seq_b之间的序列距离，换言之，两个代表性序列之间的相似程度，是根据这两个代表性序列分别具有的线段之间的相似程度而决定。FIG. 9 is a flow chart of calculating the sequence distance between representative sequences according to an embodiment of the present invention. The first sequence Seq_a and the second sequence Seq_b form a sequence pair among the representative sequences, and for each sequence pair, the step S300 (classifying the representative sequences into sets) may further include steps S310 and S320. Step S310, calculating the line segment distance between the first line segment L1 and the second line segment L2, where the line segment distance represents the closeness or similarity of the two line segments. Step S320, according to the line segment distance between the first line segment L1 and the second line segment L2, determine the sequence distance between the first sequence Seq_a and the second sequence Seq_b, in other words, the degree of similarity between the two representative sequences is It is determined according to the degree of similarity between the line segments of the two representative sequences.

图10绘示依据本发明一实施例之计算第一线段与第二线段之间的线段距离的流程图。步骤S310可以包括下列步骤。步骤S311，计算第一线段L1与第二线段L2之间的角距离d_θ(Angle Distance)、垂直距离d_⊥(Perpendicular Distance)、及平行距离d_||(ParallelDistance)。步骤S312，根据角距离d_θ、垂直距离d_⊥、及平行距离d_||，计算正规化(Normalized)角距离Nd_θ、正规化垂直距离Nd_⊥、及正规化平行距离Nd_||，其中正规化角距离Nd_θ、正规化垂直距离Nd_⊥、及正规化平行距离Nd_||的值在相同的值域内。步骤S313，根据正规化角距离Nd_θ、正规化垂直距离Nd_⊥、及正规化平行距离Nd_||的加权总和，决定第一线段L1与第二线段L2之间的线段距离。以下提供一范例说明关于两个线段之间线段距离的计算。FIG. 10 is a flow chart of calculating a line segment distance between a first line segment and a second line segment according to an embodiment of the present invention. Step S310 may include the following steps. Step S311 , calculating the angular distance d _θ (Angle Distance), the vertical distance d _⊥ (Perpendicular Distance), and the parallel distance d _|| (ParallelDistance) between the first line segment L1 and the second line segment L2. Step S312, according to the angular distance d _θ , the vertical distance d _⊥ , and the parallel distance d _|| , calculate the normalized angular distance Nd _θ , the normalized vertical distance Nd _⊥ , and the normalized parallel distance Nd _|| , where the normalized The values of the normalized angular distance Nd _θ , the normalized vertical distance Nd _⊥ , and the normalized parallel distance Nd _|| are in the same value range. Step S313 , according to the weighted sum of the normalized angular distance Nd _θ , the normalized vertical distance Nd _⊥ , and the normalized parallel distance Nd _|| , determine the line segment distance between the first line segment L1 and the second line segment L2 . An example is provided below to illustrate the calculation of the line segment distance between two line segments.

图11绘示依据本发明一实施例之两线段之间线段距离的示意图。线段距离由三个成份决定：角距离d_θ、垂直距离d_⊥、以及平行距离d_||。角距离d_θ相关于与之间的夹角θ(0≤θ≤180°)。举例而言，夹角θ可根据式子计算，其中是两个向量的内积(dot product)， FIG. 11 is a schematic diagram illustrating the line segment distance between two line segments according to an embodiment of the present invention. The line segment distance is determined by three components: the angular distance d _θ , the vertical distance d _⊥ , and the parallel distance d _|| . The angular distance d _{θ is} related to and The angle between them is θ (0≤θ≤180°). For example, the included angle θ can be based on the formula calculation, where is the inner product (dot product) of two vectors,

及代表两个向量的长度。角距离d_θ可根据以下式子(1)计算：and Represents the lengths of the two vectors. The angular distance d _θ can be calculated according to the following formula (1):

角距离d_θ代表两个向量在指向方向的相似程度，夹角θ越小，角距离d_θ越小。而当夹角θ大于90°时，相当于两个向量实质上指向相反方向，则此时的角距离d_θ可设为角距离域(Domain)的最大可能值，以表示这两个向量在方向上并不相似。The angular distance d _θ represents the similarity of two vectors in the pointing direction, the smaller the angle θ is, the smaller the angular distance d _θ is. And when the angle θ is greater than 90°, it means that the two vectors are pointing in opposite directions substantially, then the angular distance d _θ at this time can be set as the maximum possible value of the angular distance domain (Domain), to indicate that the two vectors are in The directions are not similar.

图12绘示依据本发明一实施例之计算第一线段与第二线段之间平行距离的流程图。步骤S311包括下列步骤。步骤S331，将第二起始位置点L2_s投影在第一线段L1的延伸线，以获得第三起始投影点L3_s。步骤S332，将第二结束位置点L2_e投影在第一线段L1的延伸线，以获得第三结束投影点L3_e。步骤S333，连接第三起始投影点L3_s及第三结束投影点L3_e，以产生第三线段L3，如此产生的第三起始投影点L3_s、第三结束投影点L3_e、以及第三线段L3，可见图11所绘示。步骤S334，将第一线段L1与第三线段L3的并集(Union)减去第一线段L1与第三线段L3的交集(Intersection)，以决定平行距离d_||。平行距离d_||可根据以下式子(2)计算：FIG. 12 is a flow chart of calculating the parallel distance between the first line segment and the second line segment according to an embodiment of the present invention. Step S311 includes the following steps. Step S331 , project the second initial point L2_s on the extension line of the first line segment L1 to obtain a third initial projection point L3_s. Step S332 , project the second end point L2_e on the extension line of the first line segment L1 to obtain a third end projection point L3_e. Step S333, connecting the third initial projection point L3_s and the third end projection point L3_e to generate a third line segment L3, the third initial projection point L3_s, the third end projection point L3_e, and the third line segment L3 thus generated, It can be seen in Figure 11. Step S334 , subtracting the intersection of the first line segment L1 and the third line segment L3 from the union of the first line segment L1 and the third line segment L3 to determine the parallel distance d _|| . The parallel distance d _|| can be calculated according to the following formula (2):

d_||＝L1∪L3-L1∩L3 (2)d _|| ＝L1∪L3-L1∩L3 (2)

由于第三线段L3是经由将第二线段L2投影至第一线段L1而形成，因此第三线段L3与第一线段L1共线(Collinear)。在图11所示的范例中，第一线段L1与第三线段L3的并集，是从第一起始位置点L1_s到第一结束位置点L1_e的长度，而第一线段L1与第三线段L3的交集，是从第三起始位置点L3_s到第三结束位置点L3_e的长度。在此范例中是将第二线度L2投影至第一线度L1，在其他实施例中，亦可以将第一线段L1投影至(延伸的)第二线段L2而获得平行距离d_||(计算出来的值可能不同)。平行距离d_||代表两个线段的等效平行长度之间的相似程度。Since the third line segment L3 is formed by projecting the second line segment L2 onto the first line segment L1, the third line segment L3 is collinear with the first line segment L1. In the example shown in Fig. 11, the union of the first line segment L1 and the third line segment L3 is the length from the first starting position point L1_s to the first end position point L1_e, and the first line segment L1 and the third line segment The intersection of the line segment L3 is the length from the third starting position point L3_s to the third end position point L3_e. In this example, the second line L2 is projected onto the first line L1. In other embodiments, the first line segment L1 may also be projected onto the (extended) second line segment L2 to obtain the parallel distance d _|| ( Calculated values may vary). The parallel distance d _|| represents the degree of similarity between the equivalent parallel lengths of two line segments.

垂直距离d_⊥可根据以下式子(3)计算：The vertical distance d _⊥ can be calculated according to the following formula (3):

其中l_⊥s是第二起始位置点L2_s与第三起始位置点L3_s之间的欧氏距离(Euclidean Distance)，l_⊥e是第二结束位置点L2_e与第三结束位置点L3_e之间的欧氏距离。式子(3)代表l_⊥s与l_⊥e的反调和平均(Contraharmonic Mean)。Where l _⊥s is the Euclidean distance between the second starting point L2_s and the third starting point L3_s, l _⊥e is the distance between the second ending point L2_e and the third ending point L3_e Euclidean distance of . Equation (3) represents the Contraharmonic Mean of l _⊥s and l _⊥e .

图13绘示依据本发明一实施例之两线段之间线段距离的示意图。同样可以分别依据式子(1)、(2)、(3)计算角距离d_θ、平行距离d_||、垂直距离d_⊥。在此范例中，夹角θ大于90°，因此角距离d_θ等于至于平行距离d_||，第一线段L1与第三线段L3的并集，是从第一起始位置点L1_s到第三结束位置点L3_e的长度，而第一线段L1与第三线段L3的交集，是从第三起始位置点L3_s到第一结束位置点L1_e的长度。FIG. 13 is a schematic diagram illustrating the line segment distance between two line segments according to an embodiment of the present invention. Similarly, the angular distance d _θ , the parallel distance d _|| , and the vertical distance d _⊥ can be calculated according to formulas (1), (2), and (3) respectively. In this example, the included angle θ is greater than 90°, so the angular distance d _θ is equal to As for the parallel distance d _|| , the union of the first line segment L1 and the third line segment L3 is the length from the first starting position point L1_s to the third end position point L3_e, and the first line segment L1 and the third line segment L3 The intersection of is the length from the third start point L3_s to the first end point L1_e.

如上所述，当计算线段距离时同时考虑三个成份。然而，由于这三个成份的值域可能相差很大，造成不容易根据这三个成份取得一个有意义的组合。于本发明的方法中，在步骤S312计算正规化角距离Nd_θ、正规化平行距离Nd_||、及正规化垂直距离Nd_⊥，其中正规化角距离Nd_θ、正规化平行距离Nd_||、及正规化垂直距离Nd_⊥的值在相同的值域内，例如是[0，1]，[0，1]代表大于等于0、小于等于1的区间。由于这三个正规化成份的值在相同的值域内，这三个正规化成份的线性组合对于计算两线段之间的线段距离便具有意义。在一实施例中，线段距离是正规化角距离Nd_θ、正规化平行距离Nd_||、及正规化垂直距离Nd_⊥的加权总和，线段距离可根据以下式子(4)计算：As mentioned above, three components are considered simultaneously when calculating line segment distances. However, since the value ranges of these three components may vary greatly, it is not easy to obtain a meaningful combination based on these three components. In the method of the present invention, in step _S312 , the normalized angular distance _{Nd θ} _, the normalized parallel distance _Nd _|| and the value of the normalized vertical distance Nd _⊥ are in the same value range, for example, [0, 1], [0, 1] represents an interval greater than or equal to 0 and less than or equal to 1. Since the values of these three normalization components are in the same value range, the linear combination of these three normalization components is meaningful for calculating the line segment distance between two line segments. In one embodiment, the line segment distance is the weighted sum of the normalized angular distance Nd _θ , the normalized parallel distance Nd _|| , and the normalized vertical distance Nd _⊥ , and the line segment distance can be calculated according to the following formula (4):

线段距离＝w₁×Nd_θ+w₂×ND_||+w₃×ND_⊥其中 Line segment distance＝w ₁ ×Nd _θ +w ₂ ×ND _|| +w ₃ ×ND _⊥ where

举例而言，w1、w2、w3可以都等于以获得正规化角距离Nd_θ、正规化平行距离Nd_||、及正规化垂直距离Nd_⊥的平均值。For example, w1, w2, w3 could all be equal to to obtain the average of the normalized angular distance Nd _θ , the normalized parallel distance Nd _|| , and the normalized vertical distance Nd _⊥ .

图14绘示依据本发明一实施例之计算正规化角距离、正规化垂直距离、及正规化平行距离的流程图。步骤S312可包括下列步骤。步骤S341，将角距离d_θ除以角距离域的最大值，以得到正规化角距离Nd_θ。步骤S342，将垂直距离d_⊥除以垂直距离域的最大值，以得到正规化垂直距离Nd_⊥。步骤S343，将平行距离d_||除以平行距离域的最大值，以得到正规化平行距离Nd_||。由于三个正规化距离皆是藉由除以在各自距离域的最大值而产生，因此这三个正规化距离成份的值都会在[0，1]范围。FIG. 14 shows a flow chart of calculating normalized angular distance, normalized vertical distance, and normalized parallel distance according to an embodiment of the present invention. Step S312 may include the following steps. Step S341, dividing the angular distance d _θ by the maximum value of the angular distance domain to obtain the normalized angular distance Nd _θ . Step S342, dividing the vertical distance d _⊥ by the maximum value of the vertical distance field to obtain the normalized vertical distance Nd _⊥ . Step S343 , dividing the parallel distance d _|| by the maximum value of the parallel distance field to obtain the normalized parallel distance Nd _|| . Since the three normalized distances are all generated by dividing by the maximum values in the respective distance domains, the values of the three normalized distance components are all in the range [0, 1].

如式子(1)所示，角距离域的最大值是第一线段L1以及第二线段L2当中较短一者的长度。如式子(2)所示，平行距离域的最大值是第一线段L1与第三线段L3的并集。垂直距离域的最大值不易直接从式子(3)看出，其相关计算说明如下。As shown in formula (1), the maximum value of the angular distance field is the length of the shorter one of the first line segment L1 and the second line segment L2 . As shown in formula (2), the maximum value of the parallel distance field is the union of the first line segment L1 and the third line segment L3. The maximum value of the vertical distance field is not easy to see directly from the formula (3), and its related calculation is explained as follows.

图15A～图15D绘示依据本发明一实施例之考虑垂直距离域最大值的多种情形示意图。根据式子(3)以及第一线段L1与第二线段L2的几何关系，垂直距离域的最大值发生在垂直于时。因此，在一实施例中，可将第二线段L2绕着第二起始位置点L2s或绕着第二结束位置点L2e旋转，直到垂直于第一线段L1垂直距离域的最大值是第一线段L1与旋转后的第二线段之间的垂直距离。图15A～图15D绘示四种可能的旋转情形。垂直距离域的最大值是这四种可能情形当中最大的垂直距离，垂直距离域的最大值可根据以下式子(5)计算：15A-15D are schematic diagrams illustrating various situations considering the maximum value of the vertical distance field according to an embodiment of the present invention. According to formula (3) and the geometric relationship between the first line segment L1 and the second line segment L2, the maximum value of the vertical distance domain occurs at perpendicular to Time. Therefore, in one embodiment, the second line segment L2 can be rotated around the second start point L2s or around the second end point L2e until it is perpendicular to the first line segment L1 The maximum value of the vertical distance field is the vertical distance between the first line segment L1 and the rotated second line segment. 15A-15D illustrate four possible rotation situations. The maximum value of the vertical distance field is the maximum vertical distance among the four possible situations, and the maximum value of the vertical distance field can be calculated according to the following formula (5):

其中l₂代表第二线段L2的长度。Where l ₂ represents the length of the second line segment L2.

两线段之间的线段距离可根据以上的计算程序获得，而两个代表性序列之间的序列距离，则可以根据这两个代表性序列分别具有的线段之间的线段距离而决定。图16绘示依据本发明一实施例之决定第一序列与第二序列之间序列距离的流程图。步骤S320包括下列步骤。步骤S321，根据第一序列Seq_a当中至少一线段与第二序列Seq_b当中至少一线段，在第一序列Seq_a与第二序列Seq_b之间产生多个映像(Mapping)组合。步骤S322，计算每个映像组合的映像距离。步骤S333，以每个映像组合当中的最小映像距离，作为第一序列Seq_a与第二序列Seq_b之间的序列距离。The line segment distance between two line segments can be obtained according to the above calculation procedure, and the sequence distance between two representative sequences can be determined according to the line segment distance between the line segments of the two representative sequences respectively. FIG. 16 is a flow chart of determining the sequence distance between the first sequence and the second sequence according to an embodiment of the present invention. Step S320 includes the following steps. Step S321, according to at least one segment in the first sequence Seq_a and at least one segment in the second sequence Seq_b, generate a plurality of mapping (Mapping) combinations between the first sequence Seq_a and the second sequence Seq_b. Step S322, calculating the image distance of each image combination. Step S333, using the minimum image distance in each image combination as the sequence distance between the first sequence Seq_a and the second sequence Seq_b.

第一序列Seq_a可以是多个线段依据时间顺序排列形成的序列，第一序列Seq_a例如可包括两个线段LineSega1以及LineSega2，线段LineSega1代表的移动轨迹早于线段LineSega2代表的移动轨迹。图17A～图17C绘示依据本发明一实施例之两个代表性序列之间多个映像组合的示意图。在此范例中，第二序列Seq_b亦包括两个依据时间顺序排列的线段LineSegb1以及LineSegb2。The first sequence Seq_a may be a sequence formed by arranging a plurality of line segments in chronological order. For example, the first sequence Seq_a may include two line segments LineSega1 and LineSega2. The trajectory represented by the line segment LineSega1 is earlier than the trajectory represented by the line segment LineSega2. 17A-17C are schematic diagrams illustrating combinations of multiple images between two representative sequences according to an embodiment of the present invention. In this example, the second sequence Seq_b also includes two line segments LineSegb1 and LineSegb2 arranged in time order.

在图17A中，线段映射LineSegb1至一个空线段φ，线段LineSegb2映射至线段LineSega1，而线段LineSega2映射至一个空线段φ。值得注意的是，在各代表性序列的时间顺序仍然维持原本的时间顺序。第17B图以及图17C分别绘示不同的映像组合，在各代表性序列的时间顺序同样是维持原本的时间顺序。图18绘示两个代表性序列之间的一种无效映像的范例示意图，此映像违反时间顺序，因为线段LineSega2(映射至线段LineSegb1)发生晚于线段LineSega1(映射至线段LineSegb2)，然而线段LineSegb1发生早于线段LineSegb2。对于每一个有效的映像组合(如图17A～图17C所示)，可以计算一个映射距离。第一序列Seq_a与第二序列Seq_b之间的序列距离，可以是各映像组合当中的最小映像距离。In FIG. 17A, the line segment Maps LineSegb1 to an empty line segment φ, the line segment LineSegb2 maps to the line segment LineSega1, and the line segment LineSega2 maps to an empty line segment φ. It is worth noting that the chronological order of each representative series still maintains the original chronological order. FIG. 17B and FIG. 17C show different image combinations respectively, and the time order of each representative sequence also maintains the original time order. FIG. 18 shows an example schematic diagram of an invalid mapping between two representative sequences. This mapping violates the chronological order because line segment LineSega2 (mapped to line segment LineSegb1) occurs later than line segment LineSega1 (mapped to line segment LineSegb2), whereas line segment LineSegb1 Occurs earlier than line segment LineSegb2. For each valid mapping combination (as shown in FIGS. 17A-17C ), a mapping distance can be calculated. The sequence distance between the first sequence Seq_a and the second sequence Seq_b may be the minimum image distance among all image combinations.

图19绘示依据本发明一实施例之计算映像组合的映像距离的流程图。步骤S322包括下列步骤。步骤S351，依据时间顺序，在第一序列Seq_a当中至少一线段与第二序列Seq_b当中至少一线段之间，形成多个映射对(Mapping Pair)。步骤S352，计算每个映射对的线段距离。步骤S353，计算每个映射对的线段距离的平均值，以获得映射距离。FIG. 19 is a flow chart of calculating the image distance of an image combination according to an embodiment of the present invention. Step S322 includes the following steps. Step S351 , according to time order, form a plurality of mapping pairs (Mapping Pairs) between at least one segment in the first sequence Seq_a and at least one segment in the second sequence Seq_b. Step S352, calculating the line segment distance of each mapping pair. Step S353, calculating the average value of the line segment distances of each mapping pair to obtain the mapping distance.

请参考图17A，此例中的映像组合包括三个映像对：{φ，LineSegb1}，{LineSega1，LineSegb2}，and{LineSega2，φ}。每一个映像对的线段距离，可以依据前述的线段距离计算方式(包括计算三个正规距离成份，步骤S311～S313以及式子(1)～(5))。而一个真实线段与一个空线段φ之间的线段距离可定义为1(线段距离域的最大可能值)。图17A所示的这个映像组合的映像距离，可以是这三个映射对的线段距离的平均值。举例而言，此映像组合的映像距离等于其中Nd代表两个线段之间的线段距离。类似地，第17B图中有两个映射对，映射距离可以是这两个映射对的线段距离的平均值。Please refer to FIG. 17A, the image combination in this example includes three image pairs: {φ, LineSegb1}, {LineSega1, LineSegb2}, and {LineSega2, φ}. The line segment distance of each image pair can be calculated according to the aforementioned line segment distance calculation method (including calculating three normal distance components, steps S311-S313 and formulas (1)-(5)). And the line segment distance between a real line segment and an empty line segment φ can be defined as 1 (the maximum possible value of the line segment distance field). The mapping distance of the mapping combination shown in FIG. 17A may be the average value of the line segment distances of the three mapping pairs. For example, the image distance for this image combination is equal to where Nd represents the line segment distance between two line segments. Similarly, there are two mapping pairs in Figure 17B, and the mapping distance may be the average of the line segment distances of these two mapping pairs.

如上所述的计算方式，可以计算出两个代表性序列之间的序列距离。图20绘示依据本发明一实施例之将代表性序列分类为集合的流程图。步骤S300包括下列步骤。步骤S360，将每个代表性序列作为一个集合。步骤S370，计算每个集合对之间的集合距离，集合对是由两个集合所形成。步骤S380，找出具有最小集合距离的第一集合以及第二集合。步骤S390，若最小集合距离小于距离门坎值，合并第一集合以及第二集合。As described above, the sequence distance between two representative sequences can be calculated. FIG. 20 illustrates a flowchart for classifying representative sequences into sets according to an embodiment of the present invention. Step S300 includes the following steps. Step S360, taking each representative sequence as a set. Step S370, calculating the set distance between each set pair, where the set pair is formed by two sets. Step S380, finding the first set and the second set with the minimum set distance. Step S390, if the minimum set distance is less than the distance threshold, merge the first set and the second set.

在此实施例中可以应用聚合式阶层分群(Agglomerative HierarchicalClustering)方法。于初始状态，将每个代表性序列视为一个集合。接着可以计算每一个集合对(由两个集合形成，初始状态为两个代表性序列)之间的集合距离，可依据步骤S351～步骤S353的方法计算出集合距离(因为在初始状态时，即相当于计算两个代表性序列的序列距离)。找出具有最小集合距离的两个集合，若是最小集合距离小于距离门坎值，例如0.3，则将这两个集合合并成为一个较大的集合。接着流程可再回到步骤S370，以重复地进行合并集合。合并后，有些集合会具有多个代表性序列，而对于具有多个代表性序列的集合所形成的集合对，可以计算此集合对当中所有代表性序列配对(所有配对链结，All-PairLinkage)的序列距离的平均值，以作为此集合对的集合距离，其中代表性序列配对是由集合对的两个集合当中各自的一个代表性序列所形成。举例而言，集合G1有两个代表性序列，集合G2有三个代表性序列，则集合G1与集合G2之间的集合距离，可以是2×3＝6个代表性序列配对的序列距离的平均值。In this embodiment, an Agglomerative Hierarchical Clustering method can be applied. In the initial state, each representative sequence is regarded as a set. Then the set distance between each set pair (formed by two sets, and the initial state is two representative sequences) can be calculated, and the set distance can be calculated according to the method of step S351～step S353 (because in the initial state, namely Equivalent to calculating the sequence distance of two representative sequences). Find two sets with the minimum set distance, if the minimum set distance is less than the distance threshold, such as 0.3, then merge the two sets into a larger set. Then the process can go back to step S370 to repeatedly perform merging collection. After merging, some sets will have multiple representative sequences, and for a set pair formed by a set with multiple representative sequences, all representative sequence pairs in this set pair can be calculated (all paired links, All-PairLinkage) The average value of the sequence distances of is used as the set distance of the set pair, wherein the representative sequence pair is formed by a representative sequence in each of the two sets of the set pair. For example, set G1 has two representative sequences, set G2 has three representative sequences, then the set distance between set G1 and set G2 can be the average of the sequence distances of 2×3=6 representative sequence pairs value.

在一实施例中，提供一种找出典型序列(Typical Sequence)的方法。图21绘示依据本发明一实施例之找出人群移动行为以及找出典型序列的流程图。与图2的流程图相比，图21更包括步骤S410及步骤S420。步骤S410，将日期分为多种日期类型。举例而言，日期可以分类为工作日与休假日，而工作日可进一步分类为单日休假日之前的最后一个工作日、至少双日休假日之前的最后一个工作日、单日休假日之后的第一个工作日等等。类似地，休假日可进一步分类为单日休假日、至少双日休假日的第一个休假日、至少双日休假日的最后一个休假日等等。步骤S420，根据代表性序列在目标日期类型的出现率，找出目标日期类型的典型序列。如图8所示，在聚合数日的数据后，可以得到代表性序列在特定日期类型的出现率，根据出现率，可以找出特定日期类型的典型序列。举例而言，在至少双日休假日的最后一个休假日，可能可以找到的典型序列是在两个火车站之间的移动轨迹。In one embodiment, a method for finding a typical sequence (Typical Sequence) is provided. FIG. 21 is a flow chart of finding out crowd movement behavior and finding out a typical sequence according to an embodiment of the present invention. Compared with the flowchart of FIG. 2 , FIG. 21 further includes step S410 and step S420 . Step S410, classify the date into multiple date types. For example, dates can be categorized as working days and holidays, and working days can be further categorized as the last working day before a single-day holiday, the last working day before at least a double-day holiday, the first business day and so on. Similarly, vacation days may be further classified as a single day off, the first off day of at least two days off days, the last off day of at least two day off days, and so on. Step S420, find out the typical sequence of the target date type according to the occurrence rate of the representative sequence in the target date type. As shown in Figure 8, after aggregating the data of several days, the occurrence rate of the representative sequence on a specific date type can be obtained, and according to the occurrence rate, the typical sequence of a specific date type can be found. For example, a typical sequence that might be found is a movement trajectory between two train stations on at least the last holiday of a double-day holiday.

图22绘示依据本发明一实施例之找出目标日期类型的典型序列的流程图。步骤S420包括下列步骤。步骤S421，计算测试代表性序列在属于目标日期类型日子里的第一出现率。步骤S422，计算测试代表性序列在非属于目标日期类型日子里的第二出现率。步骤S423，根据第一出现率以及第二出现率，计算统计熵(Entropy)。步骤S424，若是第一出现率大于几率门坎值，且统计熵小于熵门坎值，决定测试代表性序列为典型序列。FIG. 22 is a flowchart illustrating a typical sequence for finding a target date type according to an embodiment of the present invention. Step S420 includes the following steps. Step S421, calculating the first occurrence rate of the test representative sequence in the days belonging to the target date type. Step S422, calculating the second occurrence rate of the test representative sequence on days that do not belong to the target date type. Step S423, calculate statistical entropy (Entropy) according to the first occurrence rate and the second occurrence rate. In step S424, if the first occurrence rate is greater than the probability threshold and the statistical entropy is less than the entropy threshold, it is determined that the test representative sequence is a typical sequence.

步骤S421当中的出现率，可以在执行步骤S230之后得到(如图8所示，聚合数日的资料)，以下以一例子说明关于步骤S421～S424执行的计算。此例中的目标日期类型是至少双日休假日的第一个休假日，以class H表示。另一方面，以class(all-H)表示不是属于class H的休假日。下面表一列出两个代表性序列，以及这两个代表性序列对应的出现率。The occurrence rate in step S421 can be obtained after step S230 is executed (as shown in FIG. 8 , the data of several days are aggregated). The following uses an example to illustrate the calculation performed in steps S421-S424. The target date type in this example is the first holiday that has at least two days off, denoted by class H. On the other hand, holidays that do not belong to class H are represented by class(all-H). Table 1 below lists two representative sequences and their corresponding occurrence rates.

表一Table I

出现率即代表这个序列在这些日子出现的次数，例如序列R1在属于class H的56个日子内总共出现在41个日子，而在属于class(all-H)的128个日子内总共出现在2个日子。此例中步骤S424的几率门坎值Pth等于0.2、熵门坎值Sth等于0.6。步骤S423的统计熵可根据以下式子(6)计算：The occurrence rate represents the number of times the sequence appears on these days. For example, the sequence R1 appears on 41 days in 56 days belonging to class H, and appears in 2 days in 128 days belonging to class (all-H). a day. In this example, the probability threshold Pth of step S424 is equal to 0.2, and the entropy threshold Sth is equal to 0.6. The statistical entropy of step S423 can be calculated according to the following formula (6):

其中p_i是序列在class i的几率 (6) where p _i is the probability that the sequence is in class i (6)

根据式子(6)，序列R1的统计熵S1等于0.1熵值较大(乱度较大)代表几率分布较接近于均匀分布(Uniform Distribution)，而熵值较小则代表几率分布偏向其中一端。在上述的例子中，若是几率分布偏向class H，则此序列可以视为class H日期里的典型序列。在步骤S424，由于第一出现率(41/56)大于几率门坎值Pth，且统计熵S1＝0.1小于熵门坎值Sth，序列R1可被决定为class H日期里的典型序列。类似地，序列R2的统计熵S2亦可以根据式子(6)计算，可得到S2＝0.69，由于统计熵S2大于熵门坎值Sth，因此序列R2并不是class H日期里的典型序列。如表一所示，序列R2在class H日期里的第一出现率(28/56)与在class(all-H)日期里的第二出现率(50/128)相近，意即序列R2并不是特别会出现在哪一种日期类型当中，因此序列R2不是一个典型序列。在找出一个日期类型的多个典型序列之后，可以将多个典型序列进一步分类为集合，分类的方法可以如步骤S360、S370、S380、S390所示。According to formula (6), the statistical entropy S1 of sequence R1 is equal to 0.1 A larger entropy value (larger disorder) means that the probability distribution is closer to a uniform distribution (Uniform Distribution), while a smaller entropy value means that the probability distribution is biased towards one end. In the above example, if the probability distribution is biased towards class H, then this sequence can be regarded as a typical sequence in class H dates. In step S424, since the first occurrence rate (41/56) is greater than the probability threshold Pth, and the statistical entropy S1=0.1 is less than the entropy threshold Sth, the sequence R1 can be determined as a typical sequence in class H dates. Similarly, the statistical entropy S2 of the sequence R2 can also be calculated according to formula (6), and S2=0.69 can be obtained. Since the statistical entropy S2 is greater than the entropy threshold Sth, the sequence R2 is not a typical sequence in class H dates. As shown in Table 1, the first occurrence rate (28/56) of the sequence R2 in the class H date is similar to the second occurrence rate (50/128) in the class (all-H) date, which means that the sequence R2 does not It does not appear in any particular date type, so sequence R2 is not a typical sequence. After finding multiple typical sequences of a date type, the multiple typical sequences can be further classified into sets, and the classification method can be shown in steps S360, S370, S380, and S390.

依据本发明实施例的找出人群移动行为的方法，可藉由收集从用户装置撷取的位置数据，找出人群移动行为轨迹，用户装置例如是智能卡。发行智能卡的支付服务提供商可以依据得到的人群移动行为轨迹，以估计特定地理区域的持卡人数、对应地设计营销与广告计划、决定开设新店面的位置等等，根据本发明实施例的找出人群移动行为的方法，可以有许多层面的应用，并有助于作出重要决策。进一步而言，由于本发明实施例更可以找出特定日期类型的典型序列，支付服务提供商能够根据不同日期类型的典型序列，对应地规划与组织属于不同日期类型的活动。According to the method for finding out crowd movement behavior according to the embodiment of the present invention, the track of crowd movement behavior can be found by collecting location data captured from a user device, such as a smart card. The payment service provider who issues the smart card can estimate the number of cardholders in a specific geographical area based on the obtained crowd movement behavior trajectory, design a marketing and advertising plan accordingly, decide the location of a new store, and so on. The method of understanding the movement behavior of crowds can be applied at many levels and help to make important decisions. Furthermore, since the embodiment of the present invention can find out the typical sequence of a specific date type, the payment service provider can plan and organize activities belonging to different date types correspondingly according to the typical sequence of different date types.

综上所述，虽然本发明已以较佳实施例公开如上，然其并非用以限定本发明。本发明所属技术领域中具有通常知识者，在不脱离本发明的精神和范围内，当可作各种的更动与润饰。因此，本发明的保护范围当以权利要求保护范围为准。In summary, although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Those skilled in the art of the present invention can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of claims.

Claims

1. a kind of method for finding out crowd's mobile behavior, including：

Collect the multiple position datas with regard to multiple user's sets；

The multiple usual pattern in those position datas is detected to produce multiple representative series, wherein the respectively representative series bag An at least line segment is included, an at least line segment is between source location set and end position point；And

According to the multiple sequence distances between those representative series, those representative series are categorized as into multiple set, to look for Go out crowd's mobile behavior.

2. the method for claim 1, wherein those representative series are further included：

First ray, including the first line segment, first line segment is put a little and the first end position point between in first start bit；And

Second sequence, including second line segment, the second line segment is between the second source location set and the second end position point；

Wherein the First ray and second sequence form the sequence pair in those representative series, for each sequence It is right, those representative series are categorized as to further include the step of those are gathered：

Calculate the line segment distance between first line segment and the second line segment；And

According to the line segment distance between first line segment and the second line segment, determine between the First ray and second sequence Sequence distance.

3. method as claimed in claim 2, wherein calculating the line segment distance between first line segment and the second line segment Step is further included：

Calculate angular distance between first line segment and the second line segment, vertical range and parallel distance；

According to the angular distance, the vertical range and the parallel distance, regular angular distance, regular vertical range and just are calculated Ruleization parallel distance, wherein the value of the regular angular distance, the regular vertical range and the regular parallel distance is identical Codomain in；And

According to the weighted sum of the regular angular distance, the regular vertical range and the regular parallel distance, determine this The line segment distance between one line segment and the second line segment.

4. method as claimed in claim 3, wherein calculating the parallel distance between first line segment and the second line segment Step is further included：

Second source location set is projected in into the extension line of first line segment, to obtain the 3rd initial subpoint；

By the second end position spot projection in the extension line of first line segment, to obtain the 3rd subpoint is terminated；

Connect the 3rd initial subpoint and the 3rd end subpoint, to produce the 3rd line segment；And

The union of first line segment and the 3rd line segment is deducted into the common factor of first line segment and the 3rd line segment, to determine that this is put down Row distance.

5. method as claimed in claim 4, wherein calculating the regular angular distance, the regular vertical range and this is regular The step of changing parallel distance further includes：

By the angular distance divided by angular distance delocalization maximum, to obtain the regular angular distance；

By the vertical range divided by vertical range domain maximum, to obtain the regular vertical range；And

By the parallel distance divided by parallel distance domain maximum, to obtain the regular parallel distance.

6. method as claimed in claim 5, the maximum of the wherein angular distance delocalization is that first line segment is worked as with the second line segment The length of middle shorter one, the maximum in the parallel distance domain is the union of first line segment and the 3rd line segment, this vertically away from The maximum of delocalization is the vertical range after first line segment and rotation between line segment, and line segment is by should wherein after the rotation Second line segment around second source location set or around the second end position point rotation, until perpendicular to first line segment with Obtain.

7. method as claimed in claim 2, wherein determining the sequence distance between the First ray and second sequence Step is further included：

According to an at least line segment in the middle of an at least line segment in the middle of the First ray and second sequence, the First ray with Multiple image combinations are produced between second sequence；

Calculate the image distance of the respectively image combination；And

With the minimum mapping distance in the middle of the respectively image combination, as the sequence between the First ray and second sequence away from From.

8. method as claimed in claim 7, wherein calculate the image of the respectively image combination apart from the step of further include：

According to time sequencing, in the middle of the First ray in the middle of an at least line segment and second sequence an at least line segment it Between, form multiple mappings right；

Calculate respectively the mapping to line segment distance；And

Calculate respectively the mapping to the line segment distance mean value, to obtain the mapping distance.

9. the method for claim 1, wherein those position datas are received when those user's sets are used for and pay activity Collection.

10. the method for claim 1, wherein the step of collecting those position datas with regard to those user's sets is more wrapped Include：

Select multiple reference position points；And

By each location point in those position datas, it is substituted by and immediate those reference bits in each location point geographical position Put a little one of them.

11. the method for claim 1, wherein detecting those the usual patterns in those position datas to produce those generations The step of table sequence, further includes：

Remove the usual pattern only in those usual patterns with single location point；

In the respectively usual pattern, identical adjacent position point is removed；And

By those usual Pattern Aggregations of a few days producing those representative series.

12. the method for claim 1, wherein the step of those representative series are categorized as into those set further includes：

Respectively the representative series will gather as one；

Select two set to be formed in gathering from place near the steps and gather right, calculate respectively aggregate distance of the set between；

Find out the first set with minimal set distance and second set；And

If the minimal set distance is less than apart from threshold value, merge the first set and the second set.

13. the method for claim 1, further include：

Multiple date types will be divided into the date；And

According to those representative series in the occurrence rate of target date type, the type sequence of the target date type is found out.

14. methods as claimed in claim 13, wherein the step of finding out the type sequence of the target date type further includes：

Calculate test representative series and belong to the first occurrence rate in type date target date；

The test representative series are calculated in non-the second occurrence rate belonged in type date target date；

According to first occurrence rate and second occurrence rate, counting statistics entropy；And

If first occurrence rate is more than probability threshold value, and the statistical entropy is less than entropy threshold value, determines the test representativeness sequence It is classified as the type sequence.