KR102796540B1

KR102796540B1 - Method and apparatus for matching sub graph

Info

Publication number: KR102796540B1
Application number: KR1020220026410A
Authority: KR
Inventors: 이영구; 티 투 반 즈엉
Original assignee: 경희대학교 산학협력단
Priority date: 2021-02-26
Filing date: 2022-02-28
Publication date: 2025-04-16
Anticipated expiration: 2042-02-28
Also published as: KR20220122562A

Abstract

서브 그래프 매칭 방법 및 장치가 개시된다. 개시되는 일 실시예에 따른 서브 그래프 매칭 장치는, 복수 개의 그래프 데이터를 획득하고, 획득한 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하는 직렬화 모듈, 데이터 스트림을 그래프 스트림으로 변환하는 그래프 복원 모듈, 및 그래프 스트림에 대해 기 설정된 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색하는 매칭 모듈을 포함한다.A subgraph matching method and device are disclosed. According to one embodiment of the disclosure, the subgraph matching device includes a serialization module that obtains a plurality of graph data and generates a single data stream by combining the obtained plurality of graph data, a graph restoration module that converts the data stream into a graph stream, and a matching module that searches for a graph pattern matching a preset subgraph query for the graph stream.

Description

METHOD AND APPARATUS FOR MATCHING SUB GRAPH

본 발명의 실시예는 서브 그래프 매칭 기술과 관련된다.An embodiment of the present invention relates to a subgraph matching technique.

그래프는 데이터 관계를 표현하는 방식으로, 그래프 구조에서 각 객체 간의 관계가 간선(edge)으로 표현된다. 여기서, 서브 그래프 매칭(subgraph matching)은 대규모 그래프에서 쿼리 서브 그래프와 매칭되는 그래프 패턴을 찾는 그래프 마이닝(graph mining) 작업 중 하나이다. A graph is a way to express data relationships, and the relationships between each object in the graph structure are expressed as edges. Here, subgraph matching is one of the graph mining tasks that finds graph patterns that match a query subgraph in a large graph.

최근, RDF(Resource Description Framework)와 같은 그래프 모델을 이용하여 시맨틱 웹에서 데이터를 표현하고 교환하고 있으며, RDF 그래프 데이터의 규모가 기하 급수적으로 증가하고 있다. 따라서, 대규모의 RDF 그래프 데이터를 위한 효율적인 저장 및 쿼리 메커니즘을 개발하는 것이 필수적이다. Recently, graph models such as RDF (Resource Description Framework) have been used to represent and exchange data on the Semantic Web, and the size of RDF graph data is increasing exponentially. Therefore, it is essential to develop efficient storage and query mechanisms for large-scale RDF graph data.

이러한 개발 작업에는 정보 변경뿐만 아니라 그래프 마이닝 작업의 결과를 즉시 업데이트하는 것도 포함된다. 그러나, 원본 그래프 데이터에 변경 사항이 발생할 때마다 그래프 마이닝(예를 들어, 서브 그래프 매칭 등) 작업을 다시 수행하는 것은 비효율적이 된다. These development tasks include not only information changes but also immediate updates of the results of graph mining tasks. However, it is inefficient to re-perform graph mining tasks (e.g., subgraph matching, etc.) every time there is a change in the original graph data.

공개특허공보 제10-2020-0002027호(2020.01.07)Publication of Patent Publication No. 10-2020-0002027 (2020.01.07)

본 발명의 실시예는 새로 변경된 데이터에 대해서만 서브 그래프 쿼리에 대해 결과를 검색하도록 하여 작업 효율을 높일 수 있는 서브 그래프 매칭 방법 및 장치를 제공하기 위한 것이다.An embodiment of the present invention provides a subgraph matching method and device capable of improving work efficiency by retrieving results for subgraph queries only for newly changed data.

개시되는 일 실시예에 따른 서브 그래프 매칭 장치는, 복수 개의 그래프 데이터를 획득하고, 획득한 상기 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하는 직렬화 모듈; 상기 데이터 스트림을 그래프 스트림으로 변환하는 그래프 복원 모듈; 및 상기 그래프 스트림에 대해 기 설정된 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색하는 매칭 모듈을 포함한다.A sub-graph matching device according to one embodiment of the present disclosure includes a serialization module for obtaining a plurality of graph data and generating a single data stream by combining the obtained plurality of graph data; a graph restoration module for converting the data stream into a graph stream; and a matching module for searching a graph pattern matching a preset sub-graph query for the graph stream.

상기 직렬화 모듈은, 상기 복수 개의 그래프 데이터 각각에서 간선(Edge) 정보들을 추출하고, 추출한 상기 간선 정보들을 일렬로 연결하여 하나의 데이터 스트림을 생성할 수 있다.The above serialization module can extract edge information from each of the plurality of graph data and connect the extracted edge information in a row to create a single data stream.

상기 직렬화 모듈은, 상기 하나의 데이터 스트림을 기 설정된 단위의 배치 사이즈(batch size)로 분할하고, 상기 그래프 복원 모듈은, 상기 분할된 각 데이터 스트림의 간선 정보에 기반하여 각 데이터 스트림에 대해 타임 스탬프마다 그래프 데이터를 복원하여 그래프 스트림으로 변환할 수 있다.The above serialization module divides the one data stream into batch sizes of preset units, and the graph restoration module can restore graph data for each time stamp for each data stream based on edge information of each divided data stream and convert it into a graph stream.

상기 매칭 모듈은, 상기 그래프 스트림에서 타임 스탬프의 변화에 따라 변경되는 간선 정보를 추적하여 스트림 그래프 테이블(Stream Graph Table)을 생성하고, 생성한 상기 스트림 그래프 테이블을 이용하여 상기 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색할 수 있다.The above matching module can generate a stream graph table by tracking edge information that changes according to a change in a timestamp in the graph stream, and search for a graph pattern matching the subgraph query using the generated stream graph table.

상기 스트림 그래프 테이블(Stream Graph Table)은, 에지 레이블(edge label), 추가 인덱스(added index), 추가 에지(added edges), 제거 인덱스(removed index), 및 제거 에지(removed edges)의 칼럼들을 포함하고, 상기 에지 레이블(edge label)은 그래프의 각 간선의 출발 및 도착 정점 레이블의 집합을 저장하는 칼럼이고, 상기 추가 인덱스(added index)는 그래프에서 추가된 간선의 인덱스를 저장하는 칼럼이며, 상기 추가 에지(added edges)는 각 에지 레이블과 관련된 간선의 집합을 저장하는 칼럼이고, 상기 제거 인덱스(removed index)는 그래프에서 제거된 간선의 인덱스를 저장하는 칼럼이고, 상기 제거 에지(removed edges)는 각 에지 레이블과 관련하여 제거된 간선의 집합을 저장하는 칼럼일 수 있다.The above stream graph table may include columns of edge label, added index, added edges, removed index, and removed edges, wherein the edge label may be a column storing a set of starting and arriving vertex labels of each edge of the graph, the added index may be a column storing an index of an edge added to the graph, the added edges may be a column storing a set of edges associated with each edge label, the removed index may be a column storing an index of an edge removed from the graph, and the removed edges may be a column storing a set of edges removed associated with each edge label.

상기 매칭 모듈은, 상기 서브 그래프 쿼리에 대해 순차 간선(sequential edges) 정보를 생성하며, 상기 순차 간선 정보는 서브 그래프 쿼리에 포함된 각 간선들의 시퀀스에 대한 정보로서, 각 간선의 정보는 해당 간선의 출발 정점의 ID와 도착 정점의 ID로 구성될 수 있다.The above matching module generates sequential edges information for the subgraph query, and the sequential edges information is information about the sequence of each edge included in the subgraph query, and the information for each edge can be composed of the ID of the starting vertex and the ID of the arrival vertex of the corresponding edge.

상기 매칭 모듈은, 상기 스트림 그래프 테이블의 에지 레이블(edge label) 및 추가 에지(added edges) 칼럼에서 상기 서브 그래프 쿼리에 대한 순차 간선 정보와 매칭되는 항목을 검색할 수 있다.The above matching module can search for items matching the sequential edge information for the subgraph query in the edge label and added edges columns of the stream graph table.

상기 매칭 모듈은, 상기 검색의 결과에 따라 상기 그래프 스트림의 각 타임 스탬프마다 서브 그래프 매칭 결과 테이블을 생성하고, 상기 서브 그래프 매칭 결과 테이블은, 상기 서브 그래프 쿼리의 그래프 패턴과 매칭되는 그래프 패턴에서 각 정점의 ID를 순차적으로 배열한 것일 수 있다.The above matching module generates a subgraph matching result table for each time stamp of the graph stream according to the result of the search, and the subgraph matching result table may sequentially arrange the IDs of each vertex in a graph pattern matching the graph pattern of the subgraph query.

개시되는 일 실시예에 따른 서브 그래프 매칭 방법은, 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 방법으로서, 복수 개의 그래프 데이터를 획득하는 단계; 획득한 상기 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하는 단계; 상기 데이터 스트림을 그래프 스트림으로 변환하는 단계; 및 상기 그래프 스트림에 대해 기 설정된 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색하는 단계를 포함할 수 있다.A subgraph matching method according to one embodiment of the present disclosure is a method performed in a computing device having one or more processors and a memory storing one or more programs executed by the one or more processors, the method including: obtaining a plurality of graph data; generating a single data stream by combining the obtained plurality of graph data; converting the data stream into a graph stream; and searching for a graph pattern matching a preset subgraph query for the graph stream.

상기 하나의 데이터 스트림을 생성하는 단계는, 상기 복수 개의 그래프 데이터 각각에서 간선(Edge) 정보들을 추출하는 단계; 및 추출한 상기 간선 정보들을 일렬로 연결하여 하나의 데이터 스트림을 생성하는 단계를 포함할 수 있다.The step of generating the above one data stream may include the step of extracting edge information from each of the plurality of graph data; and the step of connecting the extracted edge information in a row to generate the one data stream.

상기 서브 그래프 매칭 방법은, 상기 하나의 데이터 스트림을 기 설정된 단위의 배치 사이즈(batch size)로 분할하는 단계를 더 포함하고, 상기 그래프 스트림으로 변환하는 단계는, 상기 분할된 각 데이터 스트림의 간선 정보에 기반하여 각 데이터 스트림에 대해 타임 스탬프마다 그래프 데이터를 복원하여 그래프 스트림으로 변환할 수 있다.The above subgraph matching method further includes a step of dividing the one data stream into batch sizes of preset units, and the step of converting into a graph stream may be performed by restoring graph data for each time stamp for each data stream based on edge information of each of the divided data streams and converting the data into a graph stream.

상기 검색하는 단계는, 상기 그래프 스트림에서 타임 스탬프의 변화에 따라 변경되는 간선 정보를 추적하여 스트림 그래프 테이블(Stream Graph Table)을 생성하는 단계; 및 생성한 상기 스트림 그래프 테이블을 이용하여 상기 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색하는 단계를 포함할 수 있다.The above-described searching step may include a step of generating a stream graph table by tracing edge information that changes according to a change in a timestamp in the graph stream; and a step of searching for a graph pattern matching the sub-graph query using the generated stream graph table.

상기 검색하는 단계는, 상기 서브 그래프 쿼리에 대해 순차 간선(sequential edges) 정보를 생성하는 단계를 더 포함하고, 상기 순차 간선 정보는 서브 그래프 쿼리에 포함된 각 간선들의 시퀀스에 대한 정보로서, 각 간선의 정보는 해당 간선의 출발 정점의 ID와 도착 정점의 ID로 구성될 수 있다.The above searching step further includes a step of generating sequential edges information for the subgraph query, wherein the sequential edges information is information about the sequence of each edge included in the subgraph query, and the information for each edge may be composed of an ID of a starting vertex and an ID of a destination vertex of the corresponding edge.

상기 검색하는 단계는, 상기 스트림 그래프 테이블의 에지 레이블(edge label) 및 추가 에지(added edges) 칼럼에서 상기 서브 그래프 쿼리에 대한 순차 간선 정보와 매칭되는 항목을 검색할 수 있다.The above searching step can search for items matching the sequential edge information for the subgraph query in the edge label and added edges columns of the stream graph table.

상기 검색하는 단계는, 상기 검색의 결과에 따라 상기 그래프 스트림의 각 타임 스탬프마다 서브 그래프 매칭 결과 테이블을 생성하는 단계를 더 포함하고, 상기 서브 그래프 매칭 결과 테이블은, 상기 서브 그래프 쿼리의 그래프 패턴과 매칭되는 그래프 패턴에서 각 정점의 ID를 순차적으로 배열한 것일 수 있다.The above searching step may further include a step of generating a subgraph matching result table for each time stamp of the graph stream according to a result of the search, and the subgraph matching result table may be a table in which the IDs of each vertex are sequentially arranged in a graph pattern matching the graph pattern of the subgraph query.

개시되는 실시예에 의하면, 그래프 스트림에 대해 스트림 그래프 테이블을 생성하여 그래프가 업데이트될 때마다 추가 또는 삭제되는 간선에 대한 정보를 저장함으로써, 서브 그래프 쿼리에 대해 이전 서브 그래프 매칭 결과를 기반으로 새로 추가 또는 삭제된 간선에 대한 매칭 결과만을 탐색하면 되는 바, 서브 그래프 매칭 작업 효율을 향상시키면서 업무 로드를 줄일 수 있게 된다. According to the disclosed embodiment, by generating a stream graph table for a graph stream and storing information about edges added or deleted whenever the graph is updated, only matching results for newly added or deleted edges need to be searched based on previous subgraph matching results for a subgraph query, thereby improving the efficiency of subgraph matching work while reducing the workload.

도 1은 개시되는 일 실시예에서 그래프 스트림을 나타낸 도면
도 2는 개시되는 일 실시예에서 서브 그래프 쿼리가 주어질 때 그래프 패턴 매칭 과정을 나타낸 도면
도 3은 본 발명의 일 실시예에 따른 서브 그래프 매칭 장치를 나타낸 블록도
도 4는 본 발명의 일 실시예에서 직렬화 모듈이 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하는 상태를 개략적으로 나타낸 도면
도 5는 본 발명의 일 실시예에 따른 스트림 그래프 테이블을 나타낸 도면
도 6은 본 발명의 일 실시예에서 서브 그래프 쿼리(query q)에 대한 순차 간선 정보를 나타낸 도면
도 7은 본 발명의 일 실시예에서 타임 스탬프 t=0에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면
도 8은 본 발명의 일 실시예에서 그래프 스트림이 타임 스탬프 t=0에서 타임 스탬프 t=1로 바뀔 때의 스트림 그래프 테이블을 나타낸 도면
도 9는 본 발명의 일 실시예에서 타임 스탬프 t=1에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면
도 10은 본 발명의 일 실시예에서 그래프 스트림이 타임 스탬프 t=1에서 타임 스탬프 t=2로 바뀔 때의 스트림 그래프 테이블을 나타낸 도면
도 11은 본 발명의 일 실시예에서 타임 스탬프 t=2에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면
도 12는 본 발명의 일 실시예에 따른 서브 그래프 매칭 방법을 나타낸 흐름도
도 13은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도FIG. 1 is a diagram illustrating a graph stream in one embodiment disclosed.
FIG. 2 is a diagram illustrating a graph pattern matching process when a subgraph query is given in an embodiment disclosed herein.
FIG. 3 is a block diagram showing a subgraph matching device according to one embodiment of the present invention.
FIG. 4 is a diagram schematically illustrating a state in which a serialization module combines multiple graph data to generate one data stream in one embodiment of the present invention.
FIG. 5 is a diagram showing a stream graph table according to one embodiment of the present invention.
FIG. 6 is a diagram showing sequential edge information for a subgraph query (query q) in one embodiment of the present invention.
FIG. 7 is a diagram illustrating a process of searching edge information matching a subgraph query at time stamp t=0 in one embodiment of the present invention.
FIG. 8 is a diagram showing a stream graph table when a graph stream changes from time stamp t=0 to time stamp t=1 in one embodiment of the present invention.
FIG. 9 is a diagram illustrating a process of searching edge information matching a subgraph query at time stamp t=1 in one embodiment of the present invention.
FIG. 10 is a diagram showing a stream graph table when a graph stream changes from time stamp t=1 to time stamp t=2 in one embodiment of the present invention.
FIG. 11 is a diagram illustrating a process of searching edge information matching a subgraph query at time stamp t=2 in one embodiment of the present invention.
Figure 12 is a flow chart showing a subgraph matching method according to one embodiment of the present invention.
FIG. 13 is a block diagram illustrating a computing environment including a computing device suitable for use in exemplary embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to help a comprehensive understanding of the methods, devices, and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing embodiments of the present invention, if it is judged that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of their functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definitions should be made based on the contents throughout this specification. The terms used in the detailed description are only for describing embodiments of the present invention, and should never be limited. Unless clearly used otherwise, the singular form includes the plural form. In this description, expressions such as "comprises" or "comprising" are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, and should not be construed to exclude the presence or possibility of one or more other features, numbers, steps, operations, elements, parts or combinations thereof other than those described.

또한, 제1, 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Also, while the terms first, second, etc. may be used to describe various components, the components should not be limited by the terms. The terms may be used to distinguish one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

그래프는 G = (V, E)로 나타내며, V는 정점(Vertex)의 집합(즉, V={v1, v2, ??, vn})을 나타내고, E는 간선(Edge)의 집합(즉, E=(e1, e2, ??, en})을 나타낸다. 각 정점에는 정점의 특징이나 정보가 포함될 수 있으며, 이와 유사하게 각 간선에는 간선의 정보와 정점 간의 관계와 같은 정보가 포함될 수 있다. 그리고, 그래프 스트림은 시간에 따라 정점의 삽입과 삭제, 간선의 삽입과 삭제가 계속적으로 발생하는 그래프 이벤트의 흐름이다. A graph is represented as G = (V, E), where V represents a set of vertices (i.e., V={v1, v2, ??, vn}) and E represents a set of edges (i.e., E=(e1, e2, ??, en}). Each vertex can contain characteristics or information about the vertex, and similarly, each edge can contain information such as information about the edge and the relationship between vertices. In addition, a graph stream is a flow of graph events in which insertion and deletion of vertices and insertion and deletion of edges continuously occur over time.

도 1은 개시되는 일 실시예에서 그래프 스트림을 나타낸 도면이다. 도 1을 참조하면, 시간 t=0에서 t=1로 변할 때, 그래프에 정점(A2)이 추가되고 그에 따라 간선(A2, B1)이 추가되며, 시간 t=1에서 t=2로 변할 때, 그래프에 간선(C2, D1)이 삭제된 것을 볼 수 있다. 한편, 이러한 그래프 스트림은 서로 다른 기기(예를 들어, 클라우드 서버, SNS 서버 등)에서 각각 생성될 수 있다. 또한, 각 그래프 스트림은 여러 데이터베이스에 분산 저장될 수 있다. FIG. 1 is a diagram illustrating a graph stream in an embodiment disclosed. Referring to FIG. 1, when changing from time t=0 to t=1, a vertex (A2) is added to the graph and an edge (A2, B1) is added accordingly, and when changing from time t=1 to t=2, an edge (C2, D1) is deleted from the graph. Meanwhile, these graph streams can be generated in different devices (e.g., cloud servers, SNS servers, etc.). In addition, each graph stream can be distributed and stored in multiple databases.

도 2는 개시되는 일 실시예에서 서브 그래프 쿼리가 주어질 때 그래프 패턴 매칭 과정을 나타낸 도면이다. FIG. 2 is a diagram illustrating a graph pattern matching process when a subgraph query is given in one embodiment of the disclosure.

도 2를 참조하면, 도 2의 (b)와 같이 서브 그래프 쿼리(Query)가 주어지는 경우, 그래프 스트림의 각 타임 스탬프(t=0, t=1, t=2)에서 서브 그래프 쿼리와 매칭되는 그래프 패턴의 탐색 결과를 나타내었다. 여기서, 매칭은 그래프 패턴이 정확히 일치하는 것뿐만 아니라 대략적으로 일치하는 것도 포함한다.Referring to Fig. 2, when a subgraph query is given as in (b) of Fig. 2, the search result of a graph pattern matching the subgraph query at each time stamp (t=0, t=1, t=2) of the graph stream is shown. Here, the matching includes not only an exact match of the graph pattern but also an approximate match.

타임 스탬프 t=0인 경우, 서브 그래프 쿼리와 매칭되는 그래프 패턴의 수는 2개이다. 타임 스탬프 t=1인 경우, 그래프에 정점(A2)이 추가되고 간선(A2, B1)이 추가됨에 따라 서브 그래프 쿼리와 매칭되는 그래프 패턴의 수는 4개로 증가하였다. 타임 스탬프 t=2인 경우, 그래프에 간선(C2, D1)이 삭제됨에 따라 서브 그래프 쿼리와 매칭되는 그래프 패턴의 수는 다시 2개로 감소한 것을 볼 수 있다. When the timestamp t=0, the number of graph patterns matching the subgraph query is 2. When the timestamp t=1, the number of graph patterns matching the subgraph query increases to 4 as a vertex (A2) and an edge (A2, B1) are added to the graph. When the timestamp t=2, the number of graph patterns matching the subgraph query decreases again to 2 as an edge (C2, D1) is deleted from the graph.

도 3은 본 발명의 일 실시예에 따른 서브 그래프 매칭 장치를 나타낸 블록도이다. FIG. 3 is a block diagram illustrating a subgraph matching device according to one embodiment of the present invention.

도 3을 참조하면, 서브 그래프 매칭 장치(100)는 직렬화 모듈(102), 그래프 복원 모듈(104), 및 매칭 모듈(106)을 포함할 수 있다. 예시적인 실시예에서, 서브 그래프 매칭 장치(100)는 다중 그래프 스트림 환경에서 서브 그래프 매칭을 수행하기 위한 것일 수 있으나, 이에 한정되는 것은 아니다. Referring to FIG. 3, the subgraph matching device (100) may include a serialization module (102), a graph restoration module (104), and a matching module (106). In an exemplary embodiment, the subgraph matching device (100) may be configured to perform subgraph matching in a multi-graph stream environment, but is not limited thereto.

직렬화 모듈(102)은 복수 개의 그래프 데이터를 획득할 수 있다. 여기서, 각 그래프 데이터는 G = (V, E)로 표현되는 그래프일 수 있다. V는 정점(Vertex)의 집합이고, E는 간선(Edge)의 집합이다. The serialization module (102) can obtain a plurality of graph data. Here, each graph data can be a graph expressed as G = (V, E). V is a set of vertices, and E is a set of edges.

예시적인 실시예에서, 직렬화 모듈(102)은 서로 다른 소스 기기(예를 들어, 데이터베이스 또는 클라우드 서버 등)로부터 복수 개의 그래프 데이터를 병렬적으로 수신할 수 있다. 그러나, 직렬화 모듈(102)이 복수 개의 그래프 데이터를 획득하는 방식이 이에 한정되는 것은 아니다.In an exemplary embodiment, the serialization module (102) can receive multiple graph data in parallel from different source devices (e.g., a database or a cloud server, etc.). However, the manner in which the serialization module (102) obtains the multiple graph data is not limited thereto.

직렬화 모듈(102)은 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성할 수 있다. 이때, 직렬화 모듈(102)은 각 그래프 데이터에서 간선(Edge) 정보들을 추출하고, 추출한 간선 정보들을 일렬로 연결하여 하나의 데이터 스트림을 생성할 수 있다. The serialization module (102) can combine multiple graph data to generate a single data stream. At this time, the serialization module (102) can extract edge information from each graph data and connect the extracted edge information in a row to generate a single data stream.

도 4는 본 발명의 일 실시예에서 직렬화 모듈(102)이 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하는 상태를 개략적으로 나타낸 도면이다. 도 4를 참조하면, 직렬화 모듈(102)은 서로 다른 3개의 소스 기기로부터 각각 제1 그래프 데이터(G1), 제2 그래프 데이터(G2), 및 제3 그래프 데이터(G3)를 획득할 수 있다. FIG. 4 is a diagram schematically illustrating a state in which a serialization module (102) combines multiple graph data to generate a single data stream in one embodiment of the present invention. Referring to FIG. 4, the serialization module (102) can obtain first graph data (G1), second graph data (G2), and third graph data (G3) from three different source devices, respectively.

직렬화 모듈(102)은 제1 그래프 데이터(G1)로부터 제1 간선 정보(e1-e2-e3)를 추출하고, 제2 그래프 데이터(G2)로부터 제2 간선 정보(e4-e5-e6)를 추출하며, 제3 그래프 데이터(G3)로부터 제3 간선 정보(e4-e5-e6)를 추출할 수 있다. 그리고, 직렬화 모듈(102)은 제1 간선 정보, 제2 간선 정보, 및 제3 간선 정보를 연결하여 하나의 데이터 스트림을 생성할 수 있다. The serialization module (102) can extract first edge information (e1-e2-e3) from the first graph data (G1), second edge information (e4-e5-e6) from the second graph data (G2), and third edge information (e4-e5-e6) from the third graph data (G3). In addition, the serialization module (102) can generate one data stream by connecting the first edge information, the second edge information, and the third edge information.

여기서, 각 간선 정보에는 그 특성 상 각 간선과 연관되는 정점의 정보들도 포함된다. 즉, 하나의 간선은 정점의 쌍으로 이루어지므로, 간선 정보에는 해당 간선과 연관되는 정점의 정보도 포함되게 된다.Here, each edge information also includes information on vertices associated with each edge due to its nature. That is, since one edge is composed of a pair of vertices, the edge information also includes information on vertices associated with the corresponding edge.

개시되는 실시예에서는, 직렬화 모듈(102)을 통해 병렬적으로 수신되는 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성하기 때문에, 병렬적으로 수신되는 그래프 데이터에 대한 동기화 문제가 발생하지 않게 된다.In the disclosed embodiment, since multiple graph data received in parallel through the serialization module (102) are combined to generate a single data stream, a synchronization problem for graph data received in parallel does not occur.

직렬화 모듈(102)은 하나의 데이터 스트림을 기 설정된 단위의 배치 사이즈(batch size)로 분할 할 수 있다. 여기서, 배치 사이즈는 서브 그래프 매칭 장치(100)의 메모리 크기에 따라 결정될 수 있다. The serialization module (102) can divide one data stream into batch sizes of preset units. Here, the batch size can be determined according to the memory size of the subgraph matching device (100).

그래프 복원 모듈(104)은 직렬화 모듈(102)에서 분할된 각 데이터 스트림(즉, 배치 사이즈에 대응하는 크기의 데이터 스트림)을 그래프 스트림으로 변환할 수 있다. 즉, 직렬화 모듈(102)에서 분할된 각 데이터 스트림은 간선 정보들만이 연결된 형태이므로, 이를 그래프 구조를 가지는 그래프 스트림으로 복원시킬 수 있다. The graph restoration module (104) can convert each data stream (i.e., a data stream of a size corresponding to the batch size) divided by the serialization module (102) into a graph stream. That is, since each data stream divided by the serialization module (102) is in a form in which only edge information is connected, it can be restored into a graph stream having a graph structure.

그래프 복원 모듈(104)은 분할된 각 데이터 스트림의 간선 정보에 기반하여 각 데이터 스트림을 그래프 스트림으로 변환할 수 있다. 즉, 간선 정보에는 그 특성 상 각 간선과 연관되는 정점의 정보들도 포함되므로, 간선 정보와 정점 정보들을 기반으로 해당 데이터 스트림을 그래프 스트림으로 변환할 수 있다. The graph restoration module (104) can convert each data stream into a graph stream based on the edge information of each divided data stream. That is, since the edge information includes information on vertices associated with each edge due to its nature, the data stream can be converted into a graph stream based on the edge information and vertex information.

그래프 복원 모듈(104)은 각 데이터 스트림에 대해 타임 스태프마다 그래프 데이터를 복원하여 그래프 스트림으로 변환할 수 있다. 변환된 그래프 스트림은 서브 그래프 매칭의 대상이 되는 그래프 스트림으로 업데이트 될 수 있다. 그래프 스트림은 타임 스탬프마다 그에 대응하는 그래프를 포함할 수 있다. The graph restoration module (104) can restore graph data for each data stream at each time stamp and convert it into a graph stream. The converted graph stream can be updated as a graph stream that is the target of subgraph matching. The graph stream can include a corresponding graph for each time stamp.

매칭 모듈(106)은 그래프 복원 모듈(104)에서 변환된 그래프 스트림에 대해 기 설정된 서브 그래프 쿼리에 매칭되는 그래프 패턴을 검색할 수 있다. The matching module (106) can search for a graph pattern matching a preset subgraph query for the graph stream converted by the graph restoration module (104).

매칭 모듈(106)은 그래프 스트림에서 타임 스탬프의 변화에 따라 변경(예를 들어, 추가 또는 삭제 등)되는 간선 정보를 추적할 수 있다. 이때, 간선 정보를 추적하기 위해 스트림 그래프 테이블(Stream Graph Table)이 사용될 수 있다. 매칭 모듈(106)은 이전 검색 결과와 변경된 간선 정보를 이용하여 새로운 검색 결과를 출력할 수 있다. The matching module (106) can track edge information that is changed (e.g., added or deleted) according to changes in the timestamp in the graph stream. At this time, a stream graph table can be used to track the edge information. The matching module (106) can output a new search result using the previous search result and the changed edge information.

여기서, 스트림 그래프 테이블은 그래프가 변경될 때마다 업데이트되는 그래프의 간선에 대한 정보를 저장하는 테이블일 수 있다. 스트림 그래프 테이블은 그래프에서 추가 또는 삭제되는 간선의 위치를 표시하기 위해 마련될 수 있다. Here, the stream graph table may be a table that stores information about edges of a graph that is updated whenever the graph changes. The stream graph table may be provided to indicate the location of edges that are added or deleted from the graph.

도 5는 본 발명의 일 실시예에 따른 스트림 그래프 테이블을 나타낸 도면이다. 여기서, 스트림 그래프 테이블은 도 1에 도시된 그래프 스트림에서 t=0인 경우의 그래프에 대한 테이블을 나타내었다. FIG. 5 is a diagram illustrating a stream graph table according to one embodiment of the present invention. Here, the stream graph table represents a table for a graph when t=0 in the graph stream illustrated in FIG. 1.

도 5를 참조하면, 스트림 그래프 테이블은 에지 레이블(edge label), 추가 인덱스(added index), 추가 에지(added edges), 제거 인덱스(removed index), 및 제거 에지(removed edges)의 5개의 칼럼(columns)으로 구성될 수 있다. Referring to FIG. 5, the stream graph table can be composed of five columns: edge label, added index, added edges, removed index, and removed edges.

에지 레이블edge label)은 그래프의 각 간선의 출발 및 도착 정점 레이블의 집합을 저장하는 칼럼일 수 있다. 즉, 에지 레이블은 각 간선의 출발 정점의 레이블과 도착 정점의 레이블로 이루어질 수 있다. 예를 들어, 에지 레이블이 AB인 경우, 해당 간선(에지)은 A라는 레이블을 갖는 정점을 출발 정점으로 하고 B라는 레이블을 갖는 정점을 도착 정점으로 하는 간선이 된다. Edge label) can be a column that stores a set of starting and ending vertex labels of each edge of the graph. That is, the edge label can be composed of the label of the starting vertex and the label of the ending vertex of each edge. For example, if the edge label is AB, the edge becomes an edge that has the vertex with the label A as the starting vertex and the vertex with the label B as the ending vertex.

추가 인덱스(added index)는 그래프에서 추가된 간선의 인덱스를 저장하는 칼럼일 수 있다. 여기서, 추가 인덱스의 값이 -1이면 추가된 간선이 업데이트 되지 않았음(즉, 추가된 간선이 없음)을 의미할 수 있다. The added index can be a column that stores the index of the edge added to the graph. Here, if the value of the added index is -1, it can mean that the added edge was not updated (i.e., there is no added edge).

추가 에지(added edges)는 각 에지 레이블과 관련된 간선의 집합을 저장하는 칼럼일 수 있다. 이때, 추가 에지는 각 에지 레이블과 관련된 간선의 출발 정점의 ID(해당 정점의 고유번호)와 도착 정점의 ID로 이루어질 수 있다. 예를 들어, 에지 레이블 AB의 경우, AB와 관련된 간선은 1개 이며, 해당 간선은 A라는 레이블을 갖는 정점의 ID인 1과 B라는 레이블을 갖는 정점의 ID인 2로서 (1, 2)로 표현되게 된다. Added edges can be a column that stores a set of edges related to each edge label. At this time, the added edge can be composed of the ID of the starting vertex (a unique number of the vertex) and the ID of the destination vertex of the edge related to each edge label. For example, in the case of edge label AB, there is one edge related to AB, and the edge is expressed as (1, 2) as 1, which is the ID of the vertex with the label A, and 2, which is the ID of the vertex with the label B.

그리고, 에지 레이블이 BC인 경우, BC와 관련된 간선은 2개이며, 그 중 하나의 간선은 (2, 4)로 표현되는데 이는 B라는 레이블을 갖는 정점의 ID인 2와 C라는 레이블을 갖는 정점의 ID 4로 이루어진다. 그리고, 다른 하나의 간선은 (2, 5)로 표현되는데 이는 B라는 레이블을 갖는 정점의 ID인 2와 C라는 레이블을 갖는 정점의 ID 5로 이루어진다.And, if the edge label is BC, there are two edges related to BC, and one of them is expressed as (2, 4), which is composed of 2, which is the ID of the vertex with the label B, and 4, which is the ID of the vertex with the label C. And, the other edge is expressed as (2, 5), which is composed of 2, which is the ID of the vertex with the label B, and 5, which is the ID of the vertex with the label C.

제거 인덱스(removed index)는 그래프에서 제거된 간선의 인덱스를 저장하는 칼럼일 수 있다. 여기서, 제거 인덱스의 값이 -1이면 제거된 간선이 업데이트 되지 않았음(즉, 제거된 간선이 없음)을 의미할 수 있다. The removed index may be a column that stores the index of the edge removed from the graph. Here, if the value of the removed index is -1, it may mean that the removed edge was not updated (i.e., there is no removed edge).

제거 에지(removed edges)는 각 에지 레이블과 관련하여 제거된 간선의 집합을 저장하는 칼럼일 수 있다. 제거 에지는 각 에지 레이블과 관련하여 제거된 간선의 출발 정점의 ID와 도착 정점의 ID로 이루어질 수 있다. Removed edges may be a column that stores a set of edges removed for each edge label. Removed edges may consist of the ID of the starting vertex and the ID of the destination vertex of the removed edge for each edge label.

한편, 서브 그래프 쿼리가 주어지는 경우, 매칭 모듈(106)은 그래프 스트림에 대한 스트림 그래프 테이블에 기반하여 서브 그래프 쿼리와 매칭되는 간선 정보를 추출할 수 있다. Meanwhile, when a subgraph query is given, the matching module (106) can extract edge information matching the subgraph query based on the stream graph table for the graph stream.

구체적으로, 서브 그래프 쿼리(query q)가 주어지는 경우, 매칭 모듈(106)은 서브 그래프 쿼리에 대해 순차 간선(sequential edges) 정보를 생성할 수 있다. 여기서, 순차 간선 정보는 서브 그래프 쿼리에 포함된 각 간선들의 시퀀스에 대한 정보로서, 각 간선은 해당 간선의 출발 정점의 ID와 도착 정점의 ID로 구성될 수 있다. 도 6은 본 발명의 일 실시예에서 서브 그래프 쿼리(query q)에 대한 순차 간선 정보를 나타낸 도면이다. Specifically, when a subgraph query (query q) is given, the matching module (106) can generate sequential edges information for the subgraph query. Here, the sequential edges information is information about the sequence of each edge included in the subgraph query, and each edge can be composed of the ID of the starting vertex and the ID of the destination vertex of the corresponding edge. FIG. 6 is a diagram illustrating sequential edges information for a subgraph query (query q) according to one embodiment of the present invention.

매칭 모듈(106)은 스트림 그래프 테이블의 에지 레이블(edge label) 및 추가 에지(added edges) 칼럼에서 서브 그래프 쿼리(query q)에 대한 순차 간선 정보와 일치하는 항목을 검색할 수 있다. 도 7은 본 발명의 일 실시예에서 타임 스탬프 t=0에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면이다. The matching module (106) can search for items matching the sequential edge information for the subgraph query (query q) in the edge label and added edges columns of the stream graph table. FIG. 7 is a diagram illustrating a process of searching for edge information matching the subgraph query at time stamp t=0 in one embodiment of the present invention.

도 7을 참조하면, 매칭 모듈(106)은 서브 그래프 쿼리(query q)에 대한 순차 간선 정보 중 간선 (1, 2)와 매칭되는 간선을 그와 대응되는 에지 레이블(edge label) AB의 추가 에지(added edges) 칼럼에서 검색할 수 있다. 그리고, 매칭 모듈(106)은 순차 간선 정보 중 간선 (2, 3)과 매칭되는 간선을 그와 대응되는 에지 레이블(edge label) BB의 추가 에지(added edges) 칼럼에서 검색할 수 있다. Referring to FIG. 7, the matching module (106) can search for an edge matching edge (1, 2) among the sequential edge information for a subgraph query (query q) in the added edges column of the corresponding edge label AB. In addition, the matching module (106) can search for an edge matching edge (2, 3) among the sequential edge information in the added edges column of the corresponding edge label BB.

이러한 방식으로 매칭 모듈(106)은 서브 그래프 쿼리(query q)에 대한 순차 간선 정보와 매칭되는 항목을 검색하여 타임 스탬프 t=0에서 서브 그래프 매칭 결과 테이블을 생성할 수 있다. 여기서, 서브 그래프 매칭 결과 테이블은 서브 그래프 쿼리의 그래프 패턴과 매칭되는 그래프 패턴(P1, P2)에서 각 정점의 ID를 순차적으로 배열한 것일 수 있다. In this way, the matching module (106) can search for items matching the sequential edge information for the subgraph query (query q) and generate a subgraph matching result table at time stamp t=0. Here, the subgraph matching result table can be a table in which the IDs of each vertex are sequentially arranged in the graph pattern (P1, P2) matching the graph pattern of the subgraph query.

도 8은 본 발명의 일 실시예에서 그래프 스트림이 타임 스탬프 t=0에서 타임 스탬프 t=1로 바뀔 때의 스트림 그래프 테이블을 나타낸 도면이다. 도 8을 참조하면, 타임 스탬프 t=1에서는 에지 레이블(edge label) AB와 관련된 간선이 추가 되었는 바, 에지 레이블(edge label) AB의 추가 에지(added edges) 칼럼에서 해당 간선에 대한 정보 즉 (7,2)를 추가할 수 있다. FIG. 8 is a diagram showing a stream graph table when a graph stream changes from a time stamp t=0 to a time stamp t=1 in one embodiment of the present invention. Referring to FIG. 8, at a time stamp t=1, an edge related to an edge label AB is added, and information about the edge, i.e., (7,2), can be added in the added edges column of the edge label AB.

도 9는 본 발명의 일 실시예에서 타임 스탬프 t=1에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면이다. 도 9를 참조하면, 매칭 모듈(106)은 타임 스탬프 t=1에서 에지 레이블(edge label) AB의 추가 에지(added edges) 칼럼에 간선 (7,2)가 추가되었으므로, 타임 스탬프 t=0에서 서브 그래프 매칭 결과 테이블에 새로 추가된 간선 (7,2)에 대한 서브 그래프 매칭 결과 테이블를 합성하여 타임 스탬프 t=1에서 서브 그래프 매칭 결과 테이블을 생성할 수 있다. 여기서, 서브 그래프 쿼리와 매칭되는 그래프 패턴(P1, P2, P3, P4)이 4개로 증가한 것을 볼 수 있다.FIG. 9 is a diagram illustrating a process of searching for edge information matching a subgraph query at time stamp t=1 according to one embodiment of the present invention. Referring to FIG. 9, since edge (7,2) is added to the added edges column of edge label AB at time stamp t=1, the matching module (106) synthesizes a subgraph matching result table for the edge (7,2) newly added to the subgraph matching result table at time stamp t=0 to generate a subgraph matching result table at time stamp t=1. Here, it can be seen that the number of graph patterns (P1, P2, P3, P4) matching the subgraph query has increased to four.

도 10은 본 발명의 일 실시예에서 그래프 스트림이 타임 스탬프 t=1에서 타임 스탬프 t=2로 바뀔 때의 스트림 그래프 테이블을 나타낸 도면이고, 도 11은 본 발명의 일 실시예에서 타임 스탬프 t=2에서 서브 그래프 쿼리와 매칭되는 간선 정보를 검색하는 과정을 나타낸 도면이다. FIG. 10 is a diagram showing a stream graph table when a graph stream changes from time stamp t=1 to time stamp t=2 in one embodiment of the present invention, and FIG. 11 is a diagram showing a process of searching for edge information matching a subgraph query at time stamp t=2 in one embodiment of the present invention.

도 10 및 도 11을 참조하면, 타임 스탬프 t=2에서 에지 레이블(edge label) CD와 관련된 간선이 제거 되었는 바, 에지 레이블(edge label) CD의 제거 에지(removed edges) 칼럼에 해당 간선에 대한 정보 즉, (5, 6)을 추가할 수 있다. Referring to FIG. 10 and FIG. 11, an edge associated with edge label CD is removed at time stamp t=2, and information about the corresponding edge, i.e., (5, 6), can be added to the removed edges column of edge label CD.

그리고, 매칭 모듈(106)은 타임 스탬프 t=1에서 서브 그래프 매칭 결과 테이블에 상기 삭제된 간선에 대한 정보 부분을 삭제하여 타임 스탬프 t=2에서 서브 그래프 매칭 결과 테이블을 생성할 수 있다. 여기서, 서브 그래프 쿼리와 매칭되는 그래프 패턴(P1, P3)이 2개로 다시 감소한 것을 볼 수 있다.And, the matching module (106) can delete the information part about the deleted edge in the subgraph matching result table at time stamp t=1 to generate the subgraph matching result table at time stamp t=2. Here, it can be seen that the number of graph patterns (P1, P3) matching the subgraph query is reduced to two again.

개시되는 실시예에 의하면, 그래프 스트림에 대해 스트림 그래프 테이블을 생성하여 그래프가 업데이트될 때마다 추가 또는 삭제되는 간선에 대한 정보를 저장함으로써, 서브 그래프 쿼리에 대해 이전 서브 그래프 매칭 결과를 기반으로 새로 추가 또는 삭제된 간선에 대한 매칭 결과만을 반환하면 되는 바, 서브 그래프 매칭 작업 효율을 향상시키면서 업무 로드를 줄일 수 있게 된다. According to the disclosed embodiment, by generating a stream graph table for a graph stream and storing information about edges added or deleted whenever the graph is updated, only matching results for newly added or deleted edges are returned based on previous subgraph matching results for subgraph queries, thereby improving the efficiency of subgraph matching work while reducing the workload.

본 명세서에서 모듈이라 함은, 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적, 구조적 결합을 의미할 수 있다. 예컨대, 상기 "모듈"은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스의 논리적인 단위를 의미할 수 있으며, 반드시 물리적으로 연결된 코드를 의미하거나, 한 종류의 하드웨어를 의미하는 것은 아니다.In this specification, the term "module" may mean a functional and structural combination of hardware for performing the technical idea of the present invention and software for operating the hardware. For example, the "module" may mean a logical unit of a given code and hardware resources for performing the given code, and does not necessarily mean physically connected code or a type of hardware.

도 12는 본 발명의 일 실시예에 따른 서브 그래프 매칭 방법을 나타낸 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.Fig. 12 is a flowchart illustrating a subgraph matching method according to one embodiment of the present invention. In the illustrated flowchart, the method is described by dividing it into a plurality of steps, but at least some of the steps may be performed in a different order, combined with other steps and performed together, omitted, divided into detailed steps, or performed by adding one or more steps that are not illustrated.

도 12를 참조하면, 서브 그래프 매칭 장치(100)는 복수 개의 그래프 데이터를 병렬적으로 획득할 수 있다(S 101). 이때, 복수 개의 그래프 데이터는 서로 다른 소스 기기로부터 획득할 수 있다. Referring to Fig. 12, the subgraph matching device (100) can obtain multiple graph data in parallel (S 101). At this time, the multiple graph data can be obtained from different source devices.

다음으로, 서브 그래프 매칭 장치(100)는 복수 개의 그래프 데이터를 결합하여 하나의 데이터 스트림을 생성할 수 있다(S 103). 서브 그래프 매칭 장치(100)는 각 그래프 데이터에서 간선(Edge) 정보들을 추출하고, 추출한 간선 정보들을 일렬로 연결하여 하나의 데이터 스트림을 생성할 수 있다. Next, the subgraph matching device (100) can combine multiple graph data to generate one data stream (S 103). The subgraph matching device (100) can extract edge information from each graph data and connect the extracted edge information in a row to generate one data stream.

다음으로, 서브 그래프 매칭 장치(100)는 하나의 데이터 스트림을 기 설정된 단위의 배치 사이즈(batch size)로 분할 할 수 있다(S 105).Next, the subgraph matching device (100) can divide one data stream into batch sizes of preset units (S 105).

다음으로, 서브 그래프 매칭 장치(100)는 분할된 각 데이터 스트림의 간선 정보에 기반하여 분할된 각 데이터 스트림을 그래프 스트림으로 변환할 수 있다(S 107).Next, the subgraph matching device (100) can convert each divided data stream into a graph stream based on edge information of each divided data stream (S 107).

다음으로, 서브 그래프 매칭 장치(100)는 그래프 스트림에서 타임 스탬프의 변화에 따라 변경되는 간선 정보를 추적하여 스트림 그래프 테이블을 생성할 수 있다(S 109). 서브 그래프 매칭 장치(100)는 그래프 스트림의 각 타임 스탬프마다 스트림 그래프 테이블을 업데이트할 수 있다. Next, the subgraph matching device (100) can generate a stream graph table by tracking edge information that changes according to changes in timestamps in the graph stream (S 109). The subgraph matching device (100) can update the stream graph table for each timestamp of the graph stream.

다음으로, 서브 그래프 매칭 장치(100)는 기 설정된 서브 그래프 쿼리에 대해 순차 간선(sequential edges) 정보를 생성할 수 있다(S 111).Next, the subgraph matching device (100) can generate sequential edges information for a preset subgraph query (S 111).

다음으로, 서브 그래프 매칭 장치(100)는 스트림 그래프 테이블의 에지 레이블(edge label) 및 추가 에지(added edges) 칼럼에서 서브 그래프 쿼리(query q)에 대한 순차 간선 정보와 매칭하는 항목을 검색하여 서브 그래프 매칭 결과 테이블을 생성할 수 있다(S 113).Next, the subgraph matching device (100) can search for items matching the sequential edge information for the subgraph query (query q) in the edge label and added edges columns of the stream graph table to generate a subgraph matching result table (S 113).

도 13은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.FIG. 13 is a block diagram illustrating a computing environment (10) including a computing device suitable for use in exemplary embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components other than those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 서브 그래프 매칭 장치(100)일 수 있다. The illustrated computing environment (10) includes a computing device (12). In one embodiment, the computing device (12) may be a subgraph matching device (100).

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.A computing device (12) includes at least one processor (14), a computer-readable storage medium (16), and a communication bus (18). The processor (14) may cause the computing device (12) to operate in accordance with the exemplary embodiments described above. For example, the processor (14) may execute one or more programs stored in the computer-readable storage medium (16). The one or more programs may include one or more computer-executable instructions, which, when executed by the processor (14), may be configured to cause the computing device (12) to perform operations in accordance with the exemplary embodiments.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.A computer-readable storage medium (16) is configured to store computer-executable instructions or program code, program data, and/or other suitable forms of information. A program (20) stored in the computer-readable storage medium (16) includes a set of instructions executable by the processor (14). In one embodiment, the computer-readable storage medium (16) may be a memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, any other form of storage medium that can be accessed by the computing device (12) and capable of storing desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.A communication bus (18) interconnects various other components of the computing device (12), including the processor (14) and computer-readable storage media (16).

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.The computing device (12) may also include one or more input/output interfaces (22) that provide interfaces for one or more input/output devices (24) and one or more network communication interfaces (26). The input/output interfaces (22) and the network communication interfaces (26) are coupled to the communication bus (18). The input/output devices (24) may be coupled to other components of the computing device (12) via the input/output interfaces (22). Exemplary input/output devices (24) may include input devices such as a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or a touchscreen), a voice or sound input device, various types of sensor devices and/or photographing devices, and/or output devices such as a display device, a printer, speakers, and/or a network card. The exemplary input/output devices (24) may be included within the computing device (12) as a component that constitutes the computing device (12), or may be coupled to the computing device (12) as a separate device distinct from the computing device (12).

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those skilled in the art will understand that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. Therefore, the scope of the rights of the present invention should not be limited to the described embodiments, but should be determined not only by the claims described below but also by equivalents of the claims.

100 : 서브 그래프 매칭 장치
102 : 직렬화 모듈
104 : 그래프 복원 모듈
106 : 매칭 모듈100 : Subgraph Matching Device
102: Serialization module
104 : Graph Restoration Module
106 : Matching module

Claims

A serialization module that acquires multiple graph data and combines the acquired multiple graph data to generate a single data stream;
A graph restoration module that converts the above data stream into a graph stream; and
Includes a matching module that searches for a graph pattern matching a preset subgraph query for the above graph stream,
The above serialization module is a subgraph matching device that extracts edge information from each of the plurality of graph data and connects the extracted edge information in a row to create a single data stream.

delete

In claim 1,
The above serialization module divides the one data stream into batch sizes of preset units,
The above graph restoration module is a subgraph matching device that restores graph data for each time stamp based on edge information of each divided data stream and converts it into a graph stream.

In claim 1,
The above matching module,
A subgraph matching device that generates a stream graph table by tracking edge information that changes according to a change in a timestamp in the above graph stream, and searches for a graph pattern matching the above subgraph query using the generated stream graph table.

In claim 4,
The above stream graph table is,
Contains columns of edge label, added index, added edges, removed index, and removed edges,
A subgraph matching device, wherein the edge label is a column storing a set of departure and arrival vertex labels of each edge of the graph, the added index is a column storing an index of an edge added to the graph, the added edges is a column storing a set of edges related to each edge label, the removed index is a column storing an index of an edge removed from the graph, and the removed edges is a column storing a set of edges removed related to each edge label.

In claim 5,
The above matching module,
Generate sequential edges information for the above subgraph query,
The above sequential edge information is information about the sequence of each edge included in the subgraph query, and the information for each edge is composed of the ID of the starting vertex and the ID of the arrival vertex of the corresponding edge, a subgraph matching device.

In claim 6,
The above matching module,
A subgraph matching device that searches for items matching sequential edge information for the subgraph query in the edge label and added edges columns of the above stream graph table.

In claim 7,
The above matching module,
Based on the results of the above search, a subgraph matching result table is generated for each time stamp of the graph stream,
The above subgraph matching result table is a subgraph matching device in which the IDs of each vertex in a graph pattern matching the graph pattern of the above subgraph query are arranged sequentially.

one or more processors, and
A method performed on a computing device having a memory storing one or more programs executed by one or more processors,
A step of acquiring multiple graph data;
A step of combining the acquired plurality of graph data to generate a single data stream;
a step of converting the above data stream into a graph stream; and
Comprising a step of searching for a graph pattern matching a preset subgraph query for the above graph stream,
The step of generating the above one data stream is:
A step of extracting edge information from each of the above multiple graph data; and
A subgraph matching method comprising a step of generating a single data stream by connecting the extracted edge information in a row.

delete

In claim 9,
The above subgraph matching method is,
Further comprising a step of dividing the above one data stream into batch sizes of preset units,
The step of converting into the above graph stream is a subgraph matching method in which the graph data is restored for each time stamp for each data stream based on the edge information of each of the divided data streams and converted into a graph stream.

In claim 9,
The above search steps are:
A step of generating a stream graph table by tracking edge information that changes according to changes in timestamps in the above graph stream; and
A subgraph matching method, comprising a step of searching for a graph pattern matching the subgraph query using the generated stream graph table.

In claim 12,
The above stream graph table is,
Contains columns of edge label, added index, added edges, removed index, and removed edges,
A subgraph matching method, wherein the edge label is a column storing a set of departure and arrival vertex labels of each edge of the graph, the added index is a column storing an index of an edge added to the graph, the added edges is a column storing a set of edges related to each edge label, the removed index is a column storing an index of an edge removed from the graph, and the removed edges is a column storing a set of edges removed related to each edge label.

In claim 13,
The above search steps are:
Further comprising a step of generating sequential edges information for the above subgraph query,
The above sequential edge information is information about the sequence of each edge included in the subgraph query, and the information for each edge is composed of the ID of the starting vertex and the ID of the arrival vertex of the corresponding edge, a subgraph matching method.

In claim 14,
The above search steps are:
A subgraph matching method for searching for items matching sequential edge information for the subgraph query in the edge label and added edges columns of the above stream graph table.

In claim 15,
The above search steps are:
Further comprising the step of generating a subgraph matching result table for each time stamp of the graph stream according to the result of the above search,
A subgraph matching method in which the above subgraph matching result table sequentially arranges the IDs of each vertex in a graph pattern matching the graph pattern of the above subgraph query.

A computer program stored in a non-transitory computer readable storage medium,
The computer program comprises one or more instructions, which, when executed by a computing device having one or more processors, cause the computing device to:
A step of acquiring multiple graph data;
A step of combining the acquired plurality of graph data to generate a single data stream;
a step of converting the above data stream into a graph stream; and
Perform a step of searching for a graph pattern matching a pre-set subgraph query for the above graph stream,
The step of generating the above one data stream is:
A step of extracting edge information from each of the above multiple graph data; and
A computer program comprising a step of connecting the extracted edge information in a row to generate a single data stream.