CN115168361A - Label management method and device - Google Patents
Label management method and device Download PDFInfo
- Publication number
- CN115168361A CN115168361A CN202210847895.7A CN202210847895A CN115168361A CN 115168361 A CN115168361 A CN 115168361A CN 202210847895 A CN202210847895 A CN 202210847895A CN 115168361 A CN115168361 A CN 115168361A
- Authority
- CN
- China
- Prior art keywords
- label
- tag
- static
- dynamic
- sql
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2308—Concurrency control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
- G06F3/0486—Drag-and-drop
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及标签管理技术领域,尤其涉及一种标签管理方法和装置。The invention relates to the technical field of label management, in particular to a label management method and device.
背景技术Background technique
标签是对平台业务数据的一种语义化表达,其既可以是对象的基础属性,也可以表示由原始数据进行计算、分析后得到的描述对象的某一特征属性。标签广泛运用于用户画像、产品画像等应用场景。Label is a semantic expression of platform business data, which can be either the basic attribute of the object or a certain characteristic attribute of the description object obtained after calculation and analysis of the original data. Tags are widely used in user portraits, product portraits and other application scenarios.
现有标签管理平台通过可视化的方式在界面上定义标签的计算过程,并通过大数据spark、hive、hbase等技术实现在pb级别数据中计算出复合业务需求的标签数据,进一步为用户分群、用户标签提供数据基础。标签管理平台是标签全生命周期管理平台,总体架构分为标签管理层、标签库、标签服务层三层。标签的生命周期分为创建、存储和查询。标签管理层以标签引擎为支撑,面向需求分析师提供拖拽、圈选、轻量级脚本等可视化标签规则定义模式,标签引擎自动解析标签规则,自动打标并落表,同时提供完备的标签管理及计量统计功能,实现标签的全生命周期管理。标签库是标签的载体,实现价值数据的沉淀,同时为标签对外服务提供资源支撑。标签服务层包含标签API服务、动态打标服务等一系列面向终端应用的服务,实现价值数据的快速服务化。The existing tag management platform defines the calculation process of tags on the interface in a visual way, and uses big data spark, hive, hbase and other technologies to calculate the tag data of compound business requirements in pb-level data, and further group users and users. Labels provide the data base. The label management platform is a label life cycle management platform. The overall structure is divided into three layers: label management layer, label library, and label service layer. The life cycle of tags is divided into creation, storage and query. The label management layer is supported by the label engine, and provides visual label rule definition modes such as drag-and-drop, circle selection, and lightweight script for demand analysts. Management and measurement statistics functions to realize the full life cycle management of labels. The tag library is the carrier of tags, realizes the precipitation of value data, and provides resource support for external services of tags. The label service layer includes a series of terminal application-oriented services such as label API service and dynamic marking service to realize the rapid service of value data.
目前,领域内主要基于数据同步引擎(DataX)+大数据存储(Hive)+搜索引擎(ElasticSearch),实现标签创建过程的数据同步和数据存储、标签搜索,在实际应用中,标签创建过程步骤多、耗时长,并且存在以下缺陷:At present, the field is mainly based on data synchronization engine (DataX) + big data storage (Hive) + search engine (ElasticSearch) to realize data synchronization, data storage, and label search in the label creation process. In practical applications, there are many steps in the label creation process. , takes a long time, and has the following defects:
1、数据同步效率低,创建标签依赖Datax平台将原始数据同步到标签存储引擎库,在大数据量场景下,同步速度较慢;标签创建成功后,需要再次借助DataX平台将数据同步至ElasticSearch,增加了系统的复杂性,降低了标签数据的使用效率。主体下标签数决定静态标签表的列数,当静态标签表列数过多,数据同步成为瓶颈。1. The data synchronization efficiency is low. Creating tags relies on the Datax platform to synchronize the original data to the tag storage engine library. In the scenario of large data volume, the synchronization speed is slow; It increases the complexity of the system and reduces the efficiency of label data usage. The number of labels under the main body determines the number of columns in the static label table. When the number of columns in the static label table is too large, data synchronization becomes a bottleneck.
2、数据实时性差,标签创建之前,需要根据创建规则预览部分标签数据,标签数据通过创建API服务的方式提供给外部平台使用,这要求查询响应时间不能太慢。标签存储到hive库,以用户标签为例,存储表结构为:创建一张userid为主键的画像表,表的其他字段为画像的特征字段,将圈选的人群与画像表进行in操作,然后group by操作,当增加或者删除特征字段时,画像表的表结构需要修改;当圈选的人群数量比较大时,涉及到大记录集的group by运算,hive语句执行性能差、执行延迟较高,不能快速获取查询结果,不适应实时场景。另一方面,生成的标签检索是通过将标签存储在ElasticSearch的大宽表中的,大宽表的结构是:在向大宽表插入数据时,需要等待业务的数据都准备好后才能跑关联表操作,然后将关联的结果插入到ElasticSearch。经常遇到某个业务方的任务延迟,导致插入ElasticSearch的关联任务无法执行,运营人员无法及时使用最新的画像数据。2. The real-time data is poor. Before creating a label, you need to preview some label data according to the creation rules. The label data is provided to the external platform by creating an API service, which requires that the query response time should not be too slow. Labels are stored in the hive library. Taking user labels as an example, the storage table structure is as follows: create a portrait table with userid as the primary key, and other fields in the table as the feature fields of the portrait, perform the in operation on the circled people and the portrait table, and then In the group by operation, when adding or deleting feature fields, the table structure of the portrait table needs to be modified; when the number of people to be circled is relatively large, the group by operation of a large record set is involved, and the execution performance of the hive statement is poor and the execution delay is high. , cannot quickly obtain query results, and is not suitable for real-time scenarios. On the other hand, the generated tags are retrieved by storing the tags in the large-width table of ElasticSearch. The structure of the large-width table is: when inserting data into the large-width table, it is necessary to wait for the business data to be ready before running the association. Table operations and then insert the associated results into ElasticSearch. It is often encountered that the task of a business party is delayed, so that the associated task inserted into ElasticSearch cannot be executed, and the operator cannot use the latest portrait data in time.
3、数据冗余,数据规则为每个对象对应的每个标签的值占用一行存储空间,导致标签名称、标签值等数据项的大量冗余,标签打标会在静态标签表和动态标签表存储多条冗余数据。3. Data redundancy, data rules occupy a row of storage space for the value of each tag corresponding to each object, resulting in a large number of redundancy of data items such as tag name and tag value, and tag marking will be in the static tag table and dynamic tag table. Store multiple pieces of redundant data.
4、标签语义单调,只能基于原始静态数据创建标签,不具备基于已有标签的聚合操作创建新标签的功能,单个标签所能描述的特征范围较小。打标过程完全依赖库表sql语句,一个主体对象只能选择一个数据源下的同一个数据库,不能用于不同源的不同库表关联查询打标。4. The label semantics are monotonous, and labels can only be created based on the original static data. It does not have the function of creating new labels based on the aggregation operation of existing labels. The range of features that a single label can describe is small. The marking process completely depends on the SQL statement of the database table. A subject object can only select the same database under one data source, and cannot be used for associated query marking of different database tables from different sources.
因此,如何构建各种维度和语义的标签,并实现可以实时快速查询的标签管理平台是亟需解决的问题。Therefore, how to construct tags of various dimensions and semantics and implement a tag management platform that can be quickly queried in real time is an urgent problem to be solved.
因此,如何提供一种高效、适用面更广的标签管理方法,成为亟待解决的技术问题。Therefore, how to provide an efficient and widely applicable label management method has become an urgent technical problem to be solved.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明主要解决的是。In view of this, the present invention mainly solves:
一方面,本发明提供一种标签管理方法,包括:On the one hand, the present invention provides a label management method, comprising:
步骤S1:创建主体对象,后端对应地在ClickHouse仓库创建标签存储表;Step S1: Create a main object, and create a tag storage table in the ClickHouse warehouse correspondingly in the back end;
步骤S2:获取外部数据源,通过FLinkX将需要打标的数据库表结构以及数据映射到CLickHouse中,并与主体对象关联;Step S2: Obtain an external data source, map the database table structure and data to be marked to CLickHouse through FLinkX, and associate with the main object;
步骤S3:在主体对象下创建静态标签;Step S3: create a static label under the main object;
步骤S4:在主体对象下通过SQL或者拖拽模式创建动态标签;Step S4: create a dynamic label through SQL or drag and drop mode under the main object;
步骤S5:将主体对象下的动态标签和静态标签发布成API接口。进一步地,本发明标签管理方法的步骤S1中,标签存储表包括静态标签存储表和动态标签存储表,其中,静态标签存储表用于存储对象基本属性对应的数据,动态标签存储表用于存储经静态标签计算、聚合得到的能描述一批对象的标签。Step S5: Publish the dynamic tags and static tags under the main object as API interfaces. Further, in step S1 of the tag management method of the present invention, the tag storage table includes a static tag storage table and a dynamic tag storage table, wherein the static tag storage table is used to store the data corresponding to the basic attributes of the object, and the dynamic tag storage table is used to store The tags that can describe a batch of objects are calculated and aggregated by static tags.
进一步地,本发明标签管理方法的步骤S2,包括:Further, step S2 of the label management method of the present invention includes:
步骤S21:获取并保存外部数据源,从外部数据源中选择一张数据表作为标签数据源表;Step S21: acquiring and saving the external data source, and selecting a data table from the external data source as the label data source table;
步骤S22:获取标签数据源表中的字段信息并映射为ClickHouse字段类型;Step S22: Obtain the field information in the tag data source table and map it to the ClickHouse field type;
步骤S23:根据字段组装建表语句,建立目标表并执行目标表中的建表语句;Step S23: build a table statement according to the field set, build a target table and execute the table building statement in the target table;
步骤S24:将标签数据源表和目标表组装为FlinkX任务执行参数并提交至ClickHouse进行数据同步。Step S24: Assemble the tag data source table and target table into FlinkX task execution parameters and submit them to ClickHouse for data synchronization.
进一步地,本发明标签管理方法的步骤S3,包括:Further, step S3 of the label management method of the present invention includes:
步骤S31:选择已添加数据源表的一列或者多列作为标签数据来源;Step S31: select one or more columns of the added data source table as the label data source;
步骤S32:静态标签存储表新增静态标签列;Step S32: adding a static label column to the static label storage table;
步骤S33:组装打标SQL语句;Step S33: assemble the marking SQL statement;
步骤S34:复制静态标签存储表的结构,创建新表;Step S34: copy the structure of the static label storage table to create a new table;
步骤S35:执行打标SQL语句,将标签数据写入新表,删除被复制的静态标签存储表;Step S35: execute the marking SQL statement, write the label data into the new table, and delete the copied static label storage table;
步骤S36:将新表表名称修改为被复制的静态标签存储表的名称。Step S36: Modify the name of the new table to the name of the copied static label storage table.
进一步地,本发明标签管理方法的步骤S4,包括:结合SQL运算符对静态标签进行集合运算,创建动态标签。Further, step S4 of the tag management method of the present invention includes: performing a set operation on static tags in combination with SQL operators to create dynamic tags.
进一步地,本发明标签管理方法的步骤S4,包括:Further, step S4 of the label management method of the present invention includes:
步骤S41:建立动态标签创建规则;Step S41: establish a dynamic label creation rule;
步骤S42:前端组装为一个json格式的表达式,后端将json转换为对象并校验表达式是否符合动态标签创建规则;Step S42: the front end is assembled into an expression in json format, and the back end converts the json into an object and verifies whether the expression conforms to the dynamic label creation rule;
步骤S43:结合SQL运算符对静态标签进行集合运算,创建动态标签;解析表达式,构建动态标签的打标SQL并执行动态标签的打标SQL;Step S43: perform a set operation on the static label in combination with the SQL operator to create a dynamic label; parse the expression, construct the marking SQL of the dynamic label and execute the marking SQL of the dynamic label;
步骤S44:检查动态标签的打标SQL的执行状态,为动态标签生产值域分布情况。Step S44: Check the execution state of the marking SQL of the dynamic label, and produce the distribution of the value range for the dynamic label.
进一步地,本发明标签管理方法的步骤S4,还包括:通过拖拽动态标签和静态标签并结合判断条件和过滤条件,创建新的动态标签。Further, step S4 of the label management method of the present invention further includes: creating a new dynamic label by dragging the dynamic label and the static label and combining the judgment condition and the filtering condition.
进一步地,本发明标签管理方法的步骤S5,包括:根据主体对象下的动态标签或者静态标签作为入参或出参,拖拉生成API接口供第三方调用。Further, step S5 of the label management method of the present invention includes: according to the dynamic label or static label under the main object as an input parameter or an output parameter, dragging to generate an API interface for a third party to call.
进一步地,本发明标签管理方法的步骤S5,还包括:通过拖拽主体对象到SQL编辑页,自动带出主体对象对应的静态标签存储表和动态标签存储表以及字段,编写SQL生成API接口,供第三方平台调用。Further, step S5 of the label management method of the present invention further includes: automatically bringing out the static label storage table and dynamic label storage table and fields corresponding to the main object by dragging and dropping the main object to the SQL editing page, and writing SQL to generate an API interface, For third-party platforms to call.
另一方面,本发明提供一种标签管理装置,包括:In another aspect, the present invention provides a label management device, comprising:
标签存储表创建模块,用于创建主体对象,后端对应地在ClickHouse仓库创建标签存储表,其中,静态标签存储表用于存储对象基本属性对应的数据,动态标签存储表用于存储经静态标签计算、聚合得到的能描述一批对象的标签;The tag storage table creation module is used to create the main object, and the back end creates a tag storage table in the ClickHouse warehouse correspondingly, wherein the static tag storage table is used to store the data corresponding to the basic attributes of the object, and the dynamic tag storage table is used to store the static tags Labels that can describe a batch of objects obtained by calculation and aggregation;
数据同步模块,用于获取并保存外部数据源,从外部数据源中选择一张数据表作为标签数据源表;获取标签数据源表中的字段信息并映射为ClickHouse字段类型;根据字段组装建表语句,建立目标表并执行目标表中的建表语句;将标签数据源表和目标表组装为FlinkX任务执行参数并提交至ClickHouse进行数据同步;The data synchronization module is used to obtain and save the external data source, select a data table from the external data source as the label data source table; obtain the field information in the label data source table and map it to the ClickHouse field type; build the table according to the field group statement, create the target table and execute the table creation statement in the target table; assemble the tag data source table and target table into FlinkX task execution parameters and submit them to ClickHouse for data synchronization;
静态标签创建模块,用于选择已添加数据源表的一列或者多列作为标签数据来源;静态标签存储表新增静态标签列;组装打标SQL语句;复制静态标签存储表的结构,创建新表;执行打标SQL语句,将标签数据写入新表,删除被复制的静态标签存储表;将新表表名称修改为被复制的静态标签存储表的名称;The static label creation module is used to select one or more columns of the added data source table as the label data source; add a static label column to the static label storage table; assemble the marking SQL statement; copy the structure of the static label storage table to create a new table ;Execute the marking SQL statement, write the label data into the new table, delete the copied static label storage table; change the name of the new table table to the name of the copied static label storage table;
动态标签创建模块,用于建立动态标签创建规则;前端组装为一个json格式的表达式,后端将json转换为对象并校验表达式是否符合动态标签创建规则;结合SQL运算符对静态标签进行集合运算,创建动态标签;解析表达式,构建动态标签的打标SQL并执行动态标签的打标SQL;检查动态标签的打标SQL的执行状态,为动态标签生产值域分布情况;还用于通过拖拽动态标签和静态标签并结合判断条件和过滤条件,创建新的动态标签;The dynamic label creation module is used to establish dynamic label creation rules; the front end is assembled into an expression in json format, and the back end converts the json into an object and verifies whether the expression conforms to the dynamic label creation rules; combined with SQL operators, the static label is processed. Set operation, create dynamic tags; parse expressions, construct dynamic tag marking SQL and execute dynamic tag marking SQL; check the execution status of dynamic tag marking SQL, and produce value range distribution for dynamic tags; also used for Create new dynamic tags by dragging dynamic tags and static tags and combining judgment conditions and filter conditions;
标签发布模块,用于根据主体对象下的动态标签或者静态标签作为入参或出参,拖拉生成API接口供第三方调用;还用于通过拖拽主体对象到SQL编辑页,自动带出主体对象对应的静态标签存储表和动态标签存储表以及字段,编写SQL生成API接口,供第三方平台调用。The label publishing module is used to generate an API interface by dragging and dropping according to the dynamic label or static label under the main object as an input or output parameter for third-party calls; it is also used to automatically bring out the main object by dragging the main object to the SQL editing page For the corresponding static label storage table and dynamic label storage table and fields, write SQL to generate API interfaces for third-party platforms to call.
本发明标签管理方法及装置,具有以下有益效果:The label management method and device of the present invention have the following beneficial effects:
1)通过并行构建标签数据,加快标签数据生产速度;1) Speed up the production of label data by constructing label data in parallel;
2)通过将HDFS文件并发导入ClickHouse,加快标签数据的就绪速度;2) Speed up the readiness of label data by concurrently importing HDFS files into ClickHouse;
3)减小查询请求平均响应时长;3) Reduce the average response time of query requests;
4)支持标签数据准实时更新;4) Support quasi-real-time update of tag data;
5)标签表达式和查询SQL对用户来说比较友好,提升系统的易维护性;5) Tag expressions and query SQL are friendly to users and improve the maintainability of the system;
6)打标和查询都在ClickHouse内处理,可以节约一半硬件资源。6) Marking and query are processed in ClickHouse, which can save half of hardware resources.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本发明示例性第一实施例标签管理方法的流程图。FIG. 1 is a flowchart of a label management method according to an exemplary first embodiment of the present invention.
图2为本发明示例性第二实施例标签管理方法的流程图。FIG. 2 is a flowchart of a label management method according to an exemplary second embodiment of the present invention.
图3为本发明示例性第三实施例标签管理方法的流程图。FIG. 3 is a flowchart of a label management method according to an exemplary third embodiment of the present invention.
图4为本发明示例性第四实施例标签管理方法的流程图。FIG. 4 is a flowchart of a label management method according to an exemplary fourth embodiment of the present invention.
图5为本发明示例性第八实施例标签管理装置的架构图。FIG. 5 is a structural diagram of a label management apparatus according to an exemplary eighth embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图对本发明实施例进行详细描述。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合;并且,基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。It should be noted that the following embodiments and features in the embodiments can be combined with each other without conflict; and, based on the embodiments in the present disclosure, those of ordinary skill in the art can obtain the results obtained without creative work. All other embodiments fall within the protection scope of the present disclosure.
需要说明的是,下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易见,本文中所描述的方面可体现于广泛多种形式中,且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本公开,所属领域的技术人员应了解,本文中所描述的一个方面可与任何其它方面独立地实施,且可以各种方式组合这些方面中的两者或两者以上。举例来说,可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外,可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。It is noted that various aspects of embodiments within the scope of the appended claims are described below. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is illustrative only. Based on this disclosure, those skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method may be practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
图1为根据本发明示例性第一实施例的一种标签管理方法的流程图,如图1所示,本实施例方法,包括:FIG. 1 is a flowchart of a label management method according to an exemplary first embodiment of the present invention. As shown in FIG. 1 , the method of this embodiment includes:
步骤S1:创建主体对象,后端对应地在ClickHouse仓库创建标签存储表;Step S1: Create a main object, and create a tag storage table in the ClickHouse warehouse correspondingly in the back end;
步骤S2:获取外部数据源,通过FLinkX将需要打标的数据库表结构以及数据映射到CLickHouse中,并与主体对象关联;Step S2: Obtain an external data source, map the database table structure and data to be marked to CLickHouse through FLinkX, and associate with the main object;
步骤S3:在主体对象下创建静态标签;Step S3: create a static label under the main object;
步骤S4:在主体对象下通过SQL或者拖拽模式创建动态标签;Step S4: create a dynamic label through SQL or drag and drop mode under the main object;
步骤S5:将主体对象下的动态标签和静态标签发布成API接口。Step S5: Publish the dynamic tags and static tags under the main object as API interfaces.
本实施例方法的步骤S1中,标签存储表包括静态标签存储表和动态标签存储表,其中,静态标签存储表用于存储对象基本属性对应的数据,动态标签存储表用于存储经静态标签计算、聚合得到的能描述一批对象的标签。In step S1 of the method in this embodiment, the tag storage table includes a static tag storage table and a dynamic tag storage table, wherein the static tag storage table is used to store the data corresponding to the basic attributes of the object, and the dynamic tag storage table is used to store the data calculated by the static tag. , Aggregated tags that describe a batch of objects.
图2为根据本发明示例性第二实施例的一种标签管理方法的流程图,本实施例是图1所示方法的优选实施例,如图2所示,本实施例方法的步骤S2,包括:FIG. 2 is a flowchart of a label management method according to an exemplary second embodiment of the present invention. This embodiment is a preferred embodiment of the method shown in FIG. 1 . As shown in FIG. 2 , step S2 of the method of this embodiment is: include:
步骤S21:获取并保存外部数据源,从外部数据源中选择一张数据表作为标签数据源表;Step S21: acquiring and saving the external data source, and selecting a data table from the external data source as the label data source table;
步骤S22:获取标签数据源表中的字段信息并映射为ClickHouse字段类型;Step S22: Obtain the field information in the tag data source table and map it to the ClickHouse field type;
步骤S23:根据字段组装建表语句,建立目标表并执行目标表中的建表语句;Step S23: build a table statement according to the field set, build a target table and execute the table building statement in the target table;
步骤S24:将标签数据源表和目标表组装为FlinkX任务执行参数并提交至ClickHouse进行数据同步。Step S24: Assemble the tag data source table and target table into FlinkX task execution parameters and submit them to ClickHouse for data synchronization.
图3为根据本发明示例性第三实施例的一种标签管理方法的流程图,本实施例是图1所示方法的优选实施例,如图3所示,本实施例方法的步骤S3,包括:FIG. 3 is a flowchart of a method for label management according to an exemplary third embodiment of the present invention. This embodiment is a preferred embodiment of the method shown in FIG. 1 . As shown in FIG. 3 , step S3 of the method of this embodiment is: include:
步骤S31:选择已添加数据源表的一列或者多列作为标签数据来源;Step S31: select one or more columns of the added data source table as the label data source;
步骤S32:静态标签存储表新增静态标签列;Step S32: adding a static label column to the static label storage table;
步骤S33:组装打标SQL语句;Step S33: assemble the marking SQL statement;
步骤S34:复制静态标签存储表的结构,创建新表;Step S34: copy the structure of the static label storage table to create a new table;
步骤S35:执行打标SQL语句,将标签数据写入新表,删除被复制的静态标签存储表;Step S35: execute the marking SQL statement, write the label data into the new table, and delete the copied static label storage table;
步骤S36:将新表表名称修改为被复制的静态标签存储表的名称。Step S36: Modify the name of the new table to the name of the copied static label storage table.
图4为根据本发明示例性第四实施例的一种标签管理方法的流程图,本实施例是图1所示方法的优选实施例,如图4所示,本实施例方法的步骤S4,包括:结合SQL运算符对静态标签进行集合运算,创建动态标签。FIG. 4 is a flowchart of a label management method according to an exemplary fourth embodiment of the present invention. This embodiment is a preferred embodiment of the method shown in FIG. 1 . As shown in FIG. 4 , step S4 of the method of this embodiment is: Including: combining SQL operators to perform set operations on static tags to create dynamic tags.
本实施例方法的步骤S4,包括:Step S4 of the method in this embodiment includes:
步骤S41:建立动态标签创建规则;Step S41: establish a dynamic label creation rule;
步骤S42:前端组装为一个json格式的表达式,后端将json转换为对象并校验表达式是否符合动态标签创建规则;Step S42: the front end is assembled into an expression in json format, and the back end converts the json into an object and verifies whether the expression conforms to the dynamic label creation rule;
步骤S43:结合SQL运算符对静态标签进行集合运算,创建动态标签;解析表达式,构建动态标签的打标SQL并执行动态标签的打标SQL;Step S43: perform a set operation on the static label in combination with the SQL operator to create a dynamic label; parse the expression, construct the marking SQL of the dynamic label and execute the marking SQL of the dynamic label;
步骤S44:检查动态标签的打标SQL的执行状态,为动态标签生产值域分布情况。Step S44: Check the execution state of the marking SQL of the dynamic label, and produce the distribution of the value range for the dynamic label.
本发明示例性第五实施例提供一种标签管理方法,本实施例是图1所示方法的优选实施例,本实施例方法的步骤S4,还包括:通过拖拽动态标签和静态标签并结合判断条件和过滤条件,创建新的动态标签。本实施例的标签管理方法,可以通过可视化编辑界面拖拽标签,图形化设置与、或者、且等连接条件和大小、比较等判断条件进行组装成新的标签。An exemplary fifth embodiment of the present invention provides a label management method. This embodiment is a preferred embodiment of the method shown in FIG. 1 . Step S4 of the method in this embodiment further includes: dragging a dynamic label and a static label and combining Judgment conditions and filter conditions to create new dynamic tags. In the label management method of this embodiment, a label can be dragged and dropped through a visual editing interface, and a new label can be assembled by graphically setting and, or, and equal connection conditions and judging conditions such as size and comparison.
本发明示例性第六实施例提供一种标签管理方法,本实施例是图1所示方法的优选实施例,本实施例方法的步骤S5,包括:根据主体对象下的动态标签或者静态标签作为入参或出参,拖拉生成API接口供第三方调用。An exemplary sixth embodiment of the present invention provides a method for managing tags. This embodiment is a preferred embodiment of the method shown in FIG. 1 . Step S5 of the method in this embodiment includes: taking a dynamic tag or a static tag under a subject object as a Input or output parameters, drag and drop to generate API interfaces for third-party calls.
本发明示例性第七实施例提供一种标签管理方法,本实施例是图1所示方法的优选实施例,本实施例方法的步骤S5,包括:通过拖拽主体对象到SQL编辑页,自动带出主体对象对应的静态标签存储表和动态标签存储表以及字段,编写SQL生成API接口,供第三方平台调用。An exemplary seventh embodiment of the present invention provides a tag management method. This embodiment is a preferred embodiment of the method shown in FIG. 1 . Step S5 of the method in this embodiment includes: dragging a subject object to a SQL editing page, automatically Bring out the static label storage table and dynamic label storage table and fields corresponding to the main object, and write the SQL generation API interface for the third-party platform to call.
图5为根据本发明示例性第八实施例的一种标签管理装置的架构图,如图5所示,本实施例的标签管理装置,包括:FIG. 5 is a structural diagram of a label management apparatus according to an exemplary eighth embodiment of the present invention. As shown in FIG. 5 , the label management apparatus of this embodiment includes:
标签存储表创建模块,用于创建主体对象,后端对应地在ClickHouse仓库创建标签存储表,其中,静态标签存储表用于存储对象基本属性对应的数据,动态标签存储表用于存储经静态标签计算、聚合得到的能描述一批对象的标签;The tag storage table creation module is used to create the main object, and the back end creates a tag storage table in the ClickHouse warehouse correspondingly, wherein the static tag storage table is used to store the data corresponding to the basic attributes of the object, and the dynamic tag storage table is used to store the static tags Labels that can describe a batch of objects obtained by calculation and aggregation;
数据同步模块,用于获取并保存外部数据源,从外部数据源中选择一张数据表作为标签数据源表;获取标签数据源表中的字段信息并映射为ClickHouse字段类型;根据字段组装建表语句,建立目标表并执行目标表中的建表语句;将标签数据源表和目标表组装为FlinkX任务执行参数并提交至ClickHouse进行数据同步;The data synchronization module is used to obtain and save the external data source, select a data table from the external data source as the label data source table; obtain the field information in the label data source table and map it to the ClickHouse field type; build the table according to the field group statement, create the target table and execute the table creation statement in the target table; assemble the tag data source table and target table into FlinkX task execution parameters and submit them to ClickHouse for data synchronization;
静态标签创建模块,用于选择已添加数据源表的一列或者多列作为标签数据来源;静态标签存储表新增静态标签列;组装打标SQL语句;复制静态标签存储表的结构,创建新表;执行打标SQL语句,将标签数据写入新表,删除被复制的静态标签存储表;将新表表名称修改为被复制的静态标签存储表的名称;The static label creation module is used to select one or more columns of the added data source table as the label data source; add a static label column to the static label storage table; assemble the marking SQL statement; copy the structure of the static label storage table to create a new table ;Execute the marking SQL statement, write the label data into the new table, delete the copied static label storage table; change the name of the new table table to the name of the copied static label storage table;
动态标签创建模块,用于建立动态标签创建规则;前端组装为一个json格式的表达式,后端将json转换为对象并校验表达式是否符合动态标签创建规则;结合SQL运算符对静态标签进行集合运算,创建动态标签;解析表达式,构建动态标签的打标SQL并执行动态标签的打标SQL;检查动态标签的打标SQL的执行状态,为动态标签生产值域分布情况;还用于通过拖拽动态标签和静态标签并结合判断条件和过滤条件,创建新的动态标签;The dynamic label creation module is used to establish dynamic label creation rules; the front end is assembled into an expression in json format, and the back end converts the json into an object and verifies whether the expression conforms to the dynamic label creation rules; combined with SQL operators, the static label is processed. Set operation, create dynamic tags; parse expressions, construct dynamic tag marking SQL and execute dynamic tag marking SQL; check the execution status of dynamic tag marking SQL, and produce value range distribution for dynamic tags; also used for Create new dynamic tags by dragging dynamic tags and static tags and combining judgment conditions and filter conditions;
标签发布模块,用于根据主体对象下的动态标签或者静态标签作为入参或出参,拖拉生成API接口供第三方调用;还用于通过拖拽主体对象到SQL编辑页,自动带出主体对象对应的静态标签存储表和动态标签存储表以及字段,编写SQL生成API接口,供第三方平台调用。The label publishing module is used to generate an API interface by dragging and dropping according to the dynamic label or static label under the main object as an input or output parameter for third-party calls; it is also used to automatically bring out the main object by dragging the main object to the SQL editing page For the corresponding static label storage table and dynamic label storage table and fields, write SQL to generate API interfaces for third-party platforms to call.
在实际应用中,本实施例的标签管理装置具有以下特点:In practical application, the label management device of this embodiment has the following characteristics:
1)通过并行构建标签数据,加快标签数据生产速度;避免了当认为超过一定数量时数据同步需要排队的不足。1) By constructing label data in parallel, the production speed of label data is accelerated; the shortage of data synchronization needs to be queued when it is considered that it exceeds a certain number.
2)通过将HDFS文件并发导入ClickHouse,直接通过ClickHouse表引擎关联hdfs直接打标加快标签数据的就绪速度。2) By concurrently importing HDFS files into ClickHouse, and directly marking with hdfs directly through the ClickHouse table engine to speed up the readiness of label data.
3)通过单表以及多表关联查询,查询请求平均响应时长在2秒以下,复杂查询在5秒以下。3) Through single-table and multi-table association queries, the average response time of query requests is less than 2 seconds, and the complex query is less than 5 seconds.
4)支持标签数据准实时更新;直接在ClickHouse打标和查询,实时生成api结果;避免现有技术中,打标数据先同步到es后,第2天才能通过api接口查询的不足。4) Support the quasi-real-time update of label data; mark and query directly in ClickHouse, and generate api results in real time; avoid the shortage of the existing technology, which can only be queried through the api interface on the second day after the marking data is first synchronized to the es.
5)标签表达式和查询SQL对用户来说比较友好,提升系统的易维护性;通过列式存储,基于bitmap交集、并集以及差集的逻辑计算,处理脚本简洁高效。避免现有技术的后端查询SQL需要写复杂行列转换的不足。5) Tag expressions and query SQL are friendly to users and improve the maintainability of the system; through columnar storage, logic calculations based on bitmap intersection, union and difference set processing scripts are concise and efficient. It avoids the shortage of complex row-column conversion in the back-end query SQL of the prior art.
6)打标和查询都在ClickHouse内处理,相对于先将数据存储到hive在同步es后对外发布API的方式,可以节约一半硬件资源。6) Marking and query are processed in ClickHouse, which can save half of hardware resources compared to the method of storing data in hive first and publishing API after synchronizing es.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art who is familiar with the technical scope disclosed by the present invention can easily think of changes or substitutions. All should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210847895.7A CN115168361A (en) | 2022-07-19 | 2022-07-19 | Label management method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210847895.7A CN115168361A (en) | 2022-07-19 | 2022-07-19 | Label management method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115168361A true CN115168361A (en) | 2022-10-11 |
Family
ID=83495649
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210847895.7A Pending CN115168361A (en) | 2022-07-19 | 2022-07-19 | Label management method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115168361A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117331513A (en) * | 2023-12-01 | 2024-01-02 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
| CN119513142A (en) * | 2025-01-21 | 2025-02-25 | 杭州古珀医疗科技有限公司 | Medical big data ES wide table generation method and device based on efficient dynamic data configuration |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112291150A (en) * | 2020-10-21 | 2021-01-29 | 新华三信息安全技术有限公司 | Label management method and device |
| CN112785368A (en) * | 2020-12-24 | 2021-05-11 | 江苏苏宁云计算有限公司 | Label production method, management method, device and system |
| CN113435591A (en) * | 2019-08-14 | 2021-09-24 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
| CN114626076A (en) * | 2022-03-01 | 2022-06-14 | 山信软件股份有限公司 | A method and device for constructing a security object portrait |
-
2022
- 2022-07-19 CN CN202210847895.7A patent/CN115168361A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113435591A (en) * | 2019-08-14 | 2021-09-24 | 中科寒武纪科技股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
| CN112291150A (en) * | 2020-10-21 | 2021-01-29 | 新华三信息安全技术有限公司 | Label management method and device |
| CN112785368A (en) * | 2020-12-24 | 2021-05-11 | 江苏苏宁云计算有限公司 | Label production method, management method, device and system |
| CN114626076A (en) * | 2022-03-01 | 2022-06-14 | 山信软件股份有限公司 | A method and device for constructing a security object portrait |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117331513A (en) * | 2023-12-01 | 2024-01-02 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
| CN117331513B (en) * | 2023-12-01 | 2024-03-19 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
| CN119513142A (en) * | 2025-01-21 | 2025-02-25 | 杭州古珀医疗科技有限公司 | Medical big data ES wide table generation method and device based on efficient dynamic data configuration |
| CN119513142B (en) * | 2025-01-21 | 2025-05-23 | 杭州古珀医疗科技有限公司 | Medical big data ES (ES) wide table generation method and device based on efficient dynamic data configuration |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109101652B (en) | Label creating and managing system | |
| CN108027818B (en) | Graph Based Queries | |
| JP6266630B2 (en) | Managing continuous queries with archived relations | |
| EP2874077B1 (en) | Stateless database cache | |
| CN111639078A (en) | Data query method and device, electronic equipment and readable storage medium | |
| CN110633186A (en) | Log monitoring system and implementation method for power metering microservice architecture | |
| CN104102710A (en) | Massive data query method | |
| CN104572856A (en) | Converged storage method of service source data | |
| US11675769B2 (en) | On-demand, dynamic and optimized indexing in natural language processing | |
| CN113779313B (en) | A knowledge management method and system based on graph database | |
| US20200342029A1 (en) | Systems and methods for querying databases using interactive search paths | |
| CN104199978A (en) | System and method for realizing metadata cache and analysis based on NoSQL and method | |
| CN115168361A (en) | Label management method and device | |
| CN116126901A (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
| CN111984660A (en) | Data synchronization method, device, equipment and storage medium | |
| CN117149907A (en) | Data synchronization method, device, equipment and medium | |
| US20220405792A1 (en) | Method and apparatus for processing commodity information, device and storage medium | |
| CN115357641A (en) | Data query method and device, electronic equipment and storage medium | |
| CN112948461A (en) | Method, apparatus, storage medium, and program product for schedule data processing | |
| CN115718571B (en) | Data management method and device based on multidimensional features | |
| US10599728B1 (en) | Metadata agent for query management | |
| US20180074867A1 (en) | Provide copy of notebook document | |
| CN113779016B (en) | Data management method and device | |
| CN114328965A (en) | Knowledge graph updating method and device and computer equipment | |
| US20210303583A1 (en) | Ranking filter algorithms |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |