CN101216839A - Method and device for network data collection - Google Patents
Method and device for network data collection Download PDFInfo
- Publication number
- CN101216839A CN101216839A CNA2008100041078A CN200810004107A CN101216839A CN 101216839 A CN101216839 A CN 101216839A CN A2008100041078 A CNA2008100041078 A CN A2008100041078A CN 200810004107 A CN200810004107 A CN 200810004107A CN 101216839 A CN101216839 A CN 101216839A
- Authority
- CN
- China
- Prior art keywords
- data
- instructions
- module
- instruction
- work
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Computer And Data Communications (AREA)
Abstract
本发明提供了一种网络数据集中设备和方法,该设备包括调度器、工作引擎和数据整合器,其中,调度器用于解析终端的数据源获取请求,生成数据访问相关的工作指令和数据整合相关的数据指令,并分别下发给工作引擎和数据整合器,且将返回的数据结果集封装为数据源返回给终端用户;工作引擎用于根据工作指令,创建并启动相应的任务获取数据;数据整合器用于根据数据指令,将获得的数据整合成统一格式的数据结果集,并返回调度器。本发明设备和方法可以使得用户不用关心数据如何集成,减少了开发者的工作量,同时整个装置可以方便的集成到其它系统中,具有较强的灵活性和可扩展性。
The present invention provides a device and method for concentrating network data. The device includes a scheduler, a work engine, and a data integrator. The scheduler is used to analyze the terminal's data source acquisition request, and generate data access-related work instructions and data integration-related The data instructions are issued to the work engine and the data integrator respectively, and the returned data result set is packaged as a data source and returned to the end user; the work engine is used to create and start the corresponding task to obtain data according to the work order; the data The integrator is used to integrate the obtained data into a data result set in a unified format according to the data instruction, and return it to the scheduler. The device and method of the invention can make users not concerned about how to integrate data, reduce the workload of developers, and at the same time, the whole device can be easily integrated into other systems, and has strong flexibility and scalability.
Description
技术领域technical field
本发明涉及计算机软件中的数据处理技术,更具体的说,是一种用来实现在网络上进行不同格式数据集中的方法和设备。The invention relates to data processing technology in computer software, more specifically, a method and equipment for realizing data collection in different formats on the network.
背景技术Background technique
随着计算机软件的发展,存在大量的不同的软件系统,而各个软件系统的数据输出和存储又各不相同,常常以不同格式保存,如何有效的整合这些数据,使其作为一个整体接入到应用系统中进行使用,是应用系统常常遇到的问题。一些商业系统中提供了一些数据整合的框架,但是要求使用者必须在该框架下编写软件,这样就限制了系统使用的灵活性,且性能开销较大,支持的数据格式有限。在实际的应用中,当需要这些数据时,常常需要提前把这些数据进行处理,比如统一取到数据库中再使用,这种方式导致开发人员的工作量大,需要先把数据读取,然后再插入到数据库中,使用起来十分不方便。在实际实施时,由于操作数据的方式往往具有多样性,且存储的格式也不相同,要把这些数据入库,进行数据读取也非一件易事。如何把这些存储在网络上不同位置的数据作为一个数据源,方便的接入到应用系统中,同时减少开发人员的工作量,是一个比较难以解决的问题。With the development of computer software, there are a large number of different software systems, and the data output and storage of each software system are different, and are often stored in different formats. How to effectively integrate these data so that they can be accessed as a whole It is a problem often encountered in the application system to use it in the application system. Some commercial systems provide some data integration frameworks, but users are required to write software under this framework, which limits the flexibility of the system, has high performance overhead, and supports limited data formats. In practical applications, when the data is needed, it is often necessary to process the data in advance, such as fetching them into the database and then using them. This method leads to a heavy workload for developers, who need to read the data first, and then It is very inconvenient to insert into the database. In actual implementation, due to the variety of ways to operate data and the different storage formats, it is not an easy task to put these data into the database and read the data. How to use these data stored in different locations on the network as a data source, conveniently access to the application system, and reduce the workload of developers is a relatively difficult problem to solve.
发明内容Contents of the invention
本发明要解决的技术问题是提供一种网络数据集中的方法和设备,以便于集中网络上不同格式的数据。The technical problem to be solved by the present invention is to provide a method and equipment for concentrating network data, so as to facilitate concentrating data in different formats on the network.
为了解决上述问题,本发明提供了一种网络数据集中方法,该方法包括以下步骤:In order to solve the above-mentioned problems, the present invention provides a kind of network data centralization method, and this method comprises the following steps:
(a)接收并解析终端发送的数据源获取请求,产生数据访问相关的工作指令和数据整合相关的数据指令;(a) Receive and analyze the data source acquisition request sent by the terminal, and generate work instructions related to data access and data instructions related to data integration;
(b)根据工作指令,创建并启动相应的任务,获取数据;(b) According to the work order, create and start the corresponding task to obtain the data;
(c)根据数据指令把获取的数据整合成统一的数据结果集;(c) Integrate the acquired data into a unified data result set according to the data instruction;
(d)封装数据结果集为数据源并返回给终端用户。(d) Encapsulate the data result set as a data source and return it to the end user.
进一步地,步骤(a)中,数据源获取请求中或数据源获取请求对应的配置文件中包括网络上处于不同位置的数据获取方法及数据连接和过滤规则及最终生成的数据集格式。Further, in step (a), the data source acquisition request or the configuration file corresponding to the data source acquisition request includes data acquisition methods, data connection and filtering rules at different locations on the network, and finally generated dataset formats.
进一步地,步骤(b)中,根据工作指令中的指令序列网络地址和文件格式对工作指令进行分类,并针对一个分类创建一个任务。Further, in step (b), the work orders are classified according to the instruction sequence network addresses and file formats in the work orders, and a task is created for a classification.
进一步地,步骤(c)中,根据数据指令对获取的数据依次进行过滤、连接和格式化整合,以得到同一的数据结果集。Further, in step (c), the acquired data is sequentially filtered, connected and formatted and integrated according to the data instruction, so as to obtain the same data result set.
为了解决上述技术问题,本发明还提供了一种网络数据集中设备,该设备包括调度器、工作引擎和数据整合器,其中,In order to solve the above technical problems, the present invention also provides a network data concentration device, which includes a scheduler, a work engine and a data integrator, wherein,
调度器用于解析终端的数据源获取请求,生成数据访问相关的工作指令和数据整合相关的数据指令,并分别下发给工作引擎和数据整合器,且将返回的数据结果集封装为数据源返回给终端用户;The scheduler is used to analyze the data source acquisition request of the terminal, generate data access-related work instructions and data integration-related data instructions, and send them to the work engine and data integrator respectively, and encapsulate the returned data result set as a data source return to end users;
工作引擎用于根据工作指令,创建并启动相应的任务获取数据;The work engine is used to create and start corresponding tasks to obtain data according to work instructions;
数据整合器用于根据数据指令,将获得的数据整合成统一格式的数据结果集,并返回调度器。The data integrator is used to integrate the obtained data into a data result set in a unified format according to the data instruction, and return it to the scheduler.
进一步地,该设备还包括存储器,用于存储预先设定的若干个配置文件,配置文件的内容包括网络上处于不同位置的数据获取方法及数据连接和过滤规则及最终生成的数据集格式。Further, the device also includes a memory for storing several pre-set configuration files, and the content of the configuration files includes data acquisition methods, data connection and filtering rules at different locations on the network, and finally generated data set formats.
进一步地,调度器包括配置文件获取模块、配置文件解析模块和数据源封装模块,其中,Further, the scheduler includes a configuration file acquisition module, a configuration file parsing module and a data source encapsulation module, wherein,
配置文件获取模块,用于接收终端数据源获取请求,并根据该请求包含的信息,查找到对应的配置文件;The configuration file acquisition module is configured to receive a terminal data source acquisition request, and find a corresponding configuration file according to the information contained in the request;
配置文件解析模块,解析获取的配置文件的内容,得到数据访问的相关信息和数据整合的相关信息,再分别转化为工作引擎可识别处理的工作指令和数据整合器可识别处理的数据指令,并将这两部分内容分别发送给工作引擎和数据整合器;The configuration file parsing module analyzes the content of the obtained configuration file, obtains relevant information about data access and data integration, and then converts them into work instructions that can be recognized and processed by the work engine and data instructions that can be recognized and processed by the data integrator, and Send these two parts to the job engine and data integrator respectively;
数据集封装模块用于将获取的数据结果集进行封装,把数据集转换为一个数据源,以一个数据源的形式返回给终端,并提供访问数据记录的接口方法。The data set encapsulation module is used to encapsulate the obtained data result set, convert the data set into a data source, return it to the terminal in the form of a data source, and provide an interface method for accessing data records.
进一步地,工作引擎包括指令接收模块、指令分析模块及数据获取模块,其中:Further, the work engine includes an instruction receiving module, an instruction analysis module and a data acquisition module, wherein:
指令接收模块用于接收工作指令,把指令进行缓存,直到所有的指令接收完毕,把这些指令传递给指令分析模块;The instruction receiving module is used to receive work instructions, cache the instructions until all instructions are received, and pass these instructions to the instruction analysis module;
指令分析模块用于对接收到的指令进行归类处理;The instruction analysis module is used to classify and process the received instructions;
数据获取模块,根据归类后信息查找存储的指令,获取数据访问方法,并根据取到的指令创建数据获取任务,一个归类对应一个任务,其中设定获取数据的相应资源以及操作步骤,任务准备就绪后启动任务。The data acquisition module searches for stored instructions according to the information after classification, obtains data access methods, and creates data acquisition tasks according to the obtained instructions. One classification corresponds to one task, and the corresponding resources and operation steps for obtaining data are set. Tasks Start the task when ready.
进一步地,指令分析模块归类的依据是指令序列网络地址和文件格式。Further, the basis for classifying the instruction analysis module is the instruction sequence network address and file format.
进一步地,数据整合器包括指令接收模块、指令分类模块、数据集接收模块和数据集处理模块,其中:Further, the data integrator includes an instruction receiving module, an instruction classification module, a data set receiving module and a data set processing module, wherein:
指令接收模块用于接收调度器的数据指令,加以存储;The instruction receiving module is used to receive the data instructions of the scheduler and store them;
指令分类模块用于将接收到的指令根据数据指令的功能进行分类,转换为对应的处理规则,包括数据过滤规则,数据连接规则和数据集格式;The instruction classification module is used to classify the received instructions according to the functions of the data instructions, and convert them into corresponding processing rules, including data filtering rules, data connection rules and data set formats;
数据集接收模块收到工作引擎的通知后用于接收所有数据结果集;The data set receiving module is used to receive all data result sets after receiving the notification from the working engine;
数据集处理模块用于对数据集进行过滤、连接和格式化,具体的,依次取出数据集,查找对应的数据过滤规则,对于该数据进行过滤处理,过滤后的结果形成一个临时数据集,当所有的数据集都完成过滤后,再根据连接规则,对过滤后的临时数据集进行连接处理,直到形成一个完整的数据集,根据定义的数据结构,把该数据集进行格式转换,完成后把数据集传递给调度器。The data set processing module is used to filter, connect and format the data sets. Specifically, take out the data sets in turn, find the corresponding data filtering rules, and filter the data. The filtered results form a temporary data set. When After all the data sets are filtered, according to the connection rules, the filtered temporary data sets are connected until a complete data set is formed. According to the defined data structure, the format of the data set is converted. After completion, the The dataset is passed to the scheduler.
与现有技术相比较,本发明设备和方法可以使得用户不用关心数据如何集成,减少了开发者的工作量,同时整个装置可以方便的集成到其它系统中,具有较强的灵活性和可扩展性。Compared with the prior art, the device and method of the present invention can make users not care about how to integrate data, reduce the workload of developers, and at the same time, the whole device can be easily integrated into other systems, with strong flexibility and scalability sex.
附图说明Description of drawings
图1是本发明网络数据集中设备的总体框架结构示意图。Fig. 1 is a schematic diagram of the overall framework structure of the network data concentration equipment of the present invention.
图2是本发明网络数据集中设备的调度器结构示意图。Fig. 2 is a schematic structural diagram of a scheduler of a network data concentration device according to the present invention.
图3是本发明网络数据集中设备的工作引擎结构示意图。Fig. 3 is a schematic diagram of the working engine structure of the network data concentration device of the present invention.
图4是本发明网络数据集中设备的数据整合器流程图。Fig. 4 is a flow chart of the data integrator of the network data concentration device of the present invention.
图5是本发明网络数据集中方法流程示意图。Fig. 5 is a schematic flow chart of the network data centralization method of the present invention.
具体实施方式Detailed ways
如图1所示,本发明网络数据集中设备可根据终端用户请求对网络数据进行集中,该网络数据集中设备包括调度器1、工作引擎2、数据整合器3及存储器4,其中:As shown in Figure 1, the network data concentration device of the present invention can centralize network data according to the end user's request. The network data concentration device includes a scheduler 1, a work engine 2, a data integrator 3 and a memory 4, wherein:
调度器1用于解析终端的数据源获取请求,生成工作指令和数据指令分别下发给工作引擎和数据整合器,且将返回的数据结果集封装为数据源返回给终端用户;The scheduler 1 is used to analyze the data source acquisition request of the terminal, generate work instructions and data instructions and send them to the work engine and data integrator respectively, and encapsulate the returned data result set as a data source and return it to the end user;
工作引擎2用于根据工作指令,创建并启动相应的任务获取数据;Work engine 2 is used to create and start corresponding tasks to obtain data according to work instructions;
数据整合器3用于根据数据指令,将获得的数据整合成统一格式的数据结果集,并返回调度器。The data integrator 3 is used to integrate the obtained data into a data result set in a unified format according to the data instruction, and return it to the scheduler.
存储器4用于存储预先设定好的若干个配置文件,该配置文件以XML的形式进行保存,内容包括网络上处于不同位置的数据获取方法及数据连接和过滤规则及最终生成的数据集格式。The memory 4 is used to store several pre-set configuration files, which are saved in the form of XML, and the content includes data acquisition methods, data connection and filtering rules at different locations on the network, and the format of the final generated data set.
当本发明网络数据集中设备收到终端的数据源获取请求时,调度器可以根据该请求在存储器中匹配到对应的配置文件,从而解析出工作指令和数据指令。When the network data concentration device of the present invention receives a data source acquisition request from a terminal, the scheduler can match the corresponding configuration file in the memory according to the request, thereby parsing out work instructions and data instructions.
在存储器中预先设定好若干配置文件是本发明方法和设备的最佳实施方式,可使得终端用户的操作简单易行,当然也可以简化该设备,不预先设定配置文件,而在终端的数据源获取请求中携带相应信息,调度器根据该相应信息解析出用于下发给工作引擎和数据整合器的工作指令和数据指令。此时,数据源获取请求中携带的信息与配置文件相当,包括数据的读取和访问方式以及数据的连接方式及数据结果集的数据格式等。Presetting several configuration files in the memory is the best implementation mode of the method and device of the present invention, which can make the operation of the terminal user simple and easy, and of course the device can also be simplified. The data source acquisition request carries corresponding information, and the scheduler parses out the work instructions and data instructions for sending to the work engine and data integrator according to the corresponding information. At this point, the information carried in the data source acquisition request is equivalent to the configuration file, including the way to read and access data, the way to connect to data, and the data format of the data result set.
存储器可以作为本发明网络数据集中设备的维护工具,支持新增数据类型,新增数据类型时可以通过在XML文件中扩充这种类型的访问资料和数据读取类(也就是如何把文件中的数据读取处理的指南,因为每个格式的文件的数据读取方法是不同的)来实现,这样调度器把获取数据的操作转换为对应的工作指令给工作引擎,工作引擎就可以通过指令获取对应的数据集。Storer can be used as the maintenance tool of network data concentration equipment of the present invention, supports newly-increased data type, can be by expanding the access data of this type and data reading class in XML file when adding new data type (that is how to put in the file Guide to data reading and processing, because the data reading method of each file format is different), so that the scheduler converts the operation of obtaining data into corresponding work instructions to the work engine, and the work engine can obtain the data through the instruction the corresponding data set.
以下结合附图对网络数据集中设备进行详细说明。The network data concentration device will be described in detail below in conjunction with the accompanying drawings.
调度器1主要负责根据终端的数据源获取请求找到相应的配置文件,把配置文件解析处理,获取工作引擎的工作指令和数据整合器的数据指令,工作指令指获取网络上不同位置数据的一系列指令序列。数据指令指包括数据连接、过滤规则和最终生成的数据集格式的一系列指令序列。通过工作引擎的数据处理和数据整合器的操作获取到一个数据集,把该数据集封装为数据源传递给终端。Scheduler 1 is mainly responsible for finding the corresponding configuration file according to the data source acquisition request of the terminal, parsing and processing the configuration file, and obtaining the work instruction of the work engine and the data instruction of the data integrator. The work instruction refers to a series of data obtained from different locations on the network sequence of instructions. Data instructions refer to a series of instruction sequences including data connection, filtering rules and the format of the final generated dataset. A data set is obtained through the data processing of the working engine and the operation of the data integrator, and the data set is packaged as a data source and passed to the terminal.
如图2所示,调度器1主要包括配置文件获取模块、配置文件解析模块和数据源封装模块,其中:As shown in Figure 2, the scheduler 1 mainly includes a configuration file acquisition module, a configuration file parsing module and a data source encapsulation module, wherein:
配置文件获取模块,用于接收终端的数据源获取请求,并根据该请求的包含的信息,查找到对应的配置文件;The configuration file acquisition module is used to receive the data source acquisition request of the terminal, and find the corresponding configuration file according to the information contained in the request;
配置文件解析模块,解析获取的配置文件的内容,得到数据访问的相关信息和数据整合的相关信息,再分别转化为工作引擎可识别处理的工作指令和数据整合器可识别处理的数据指令,并将这两部分内容分别发送给工作引擎和数据整合器;The configuration file parsing module analyzes the content of the obtained configuration file, obtains relevant information about data access and data integration, and then converts them into work instructions that can be recognized and processed by the work engine and data instructions that can be recognized and processed by the data integrator, and Send these two parts to the job engine and data integrator respectively;
其中工作指令指数据源在网络上的位置以及访问数据源的相关方法,网络上的位置可以通过IP相关的信息来进行表示,数据源的信息对应了网络上不同位置的不同格式的数据源。具体地,工作指令是在对应软件基础上可以执行的指令序列,如果是FTP的话,那么工作指令指“打开一个IP地址”,“切换到数据文件所在的目录”、“获取该文件”等一系列指令序列。The work instruction refers to the location of the data source on the network and related methods of accessing the data source. The location on the network can be represented by IP-related information, and the information of the data source corresponds to data sources in different formats in different locations on the network. Specifically, a work instruction is an instruction sequence that can be executed on the basis of the corresponding software. If it is FTP, then the work instruction refers to "open an IP address", "switch to the directory where the data file is located", "obtain the file", etc. sequence of instructions.
如果要获取的数据是一个放在FTP上的EXCEL文件,则解析出来的工作指令就有文件FTP的URL信息、访问需要的用户名和密码,以及访问文件数据读取类,读取类用于获取文件里面的数据(调用对应的API获取到EXCEL的数据,在JAVA环境下可以通过POI这个包获取到EXCEL中的数据),如果是数据库,则解析出来的工作指令定义了数据库类型,数据库名称、库中表或者视图,连接的用户名和密码等资源。If the data to be obtained is an EXCEL file placed on FTP, the parsed work instruction includes the URL information of the file FTP, the user name and password required for access, and the access file data reading class, which is used to obtain The data in the file (the data in EXCEL can be obtained by calling the corresponding API, and the data in EXCEL can be obtained through the POI package in the JAVA environment). If it is a database, the parsed work order defines the database type, database name, Resources such as tables or views in the library, connected user names and passwords, etc.
数据指令指把获取的一系列不同数据集整合为一个数据集的规则,分为数据的过滤规则和连接规则、数据集格式,这些以数据指令的形式下发到数据整合器。过滤规则用于过滤不符合条件的数据,数据的连接规则用于把这些数据进行拼装,整合为一个完整的数据集;数据集格式用于规定形成的数据集的数据结构。Data instructions refer to the rules for integrating a series of different data sets obtained into one data set, which are divided into data filtering rules and connection rules, and data set formats, which are sent to the data integrator in the form of data instructions. Filtering rules are used to filter unqualified data, and data connection rules are used to assemble these data into a complete data set; the data set format is used to specify the data structure of the formed data set.
数据集封装模块用于将获取的数据结果集进行封装,把数据集转换为一个数据源,以一个数据源的形式返回给终端,并提供访问数据记录的接口方法。The data set encapsulation module is used to encapsulate the obtained data result set, convert the data set into a data source, return it to the terminal in the form of a data source, and provide an interface method for accessing data records.
工作引擎2,根据调度器传来的工作指令,启动相应的任务到网络中获取相关数据,把获取到的数据传递给数据整合器处理。如图3所示,工作引擎主要包括指令接收模块、指令分析模块及数据获取模块,其中:Work engine 2, according to the work instructions sent by the scheduler, starts corresponding tasks to obtain relevant data from the network, and passes the obtained data to the data integrator for processing. As shown in Figure 3, the working engine mainly includes an instruction receiving module, an instruction analysis module and a data acquisition module, among which:
指令接收模块用于接收工作指令,把指令进行缓存,直到所有的指令接收完毕,再把这些指令传递给指令分析模块;The instruction receiving module is used to receive work instructions, cache the instructions until all instructions are received, and then pass these instructions to the instruction analysis module;
指令分析模块用于对接收到的指令进行归类处理;The instruction analysis module is used to classify and process the received instructions;
归类的依据是指令序列网络地址和文件格式,比如说同一个FTP地址上的两个EXCEL文件,就会被归成一个类。Classification is based on the command sequence network address and file format, for example, two EXCEL files on the same FTP address will be classified into one class.
数据获取模块,根据归类后信息查找存储的指令,获取数据访问方法,并根据取到的指令创建数据获取任务,一个归类对应一个任务,其中设定获取数据的相应资源以及操作步骤,任务准备就绪后,把该任务的状态信息修改为执行状态,启动任务。The data acquisition module searches for stored instructions according to the information after classification, obtains data access methods, and creates data acquisition tasks according to the obtained instructions. One classification corresponds to one task, and the corresponding resources and operation steps for obtaining data are set. Tasks When ready, modify the status information of the task to the execution status, and start the task.
当收集获取的数据结果集的任务执行完成后,就会发送一个执行完毕的通知,收到这个通知后,就会判断是否所有的数据获取任务都已经完成,如果没有完成就继续等待,当所有的数据收集任务都已经完成,通知数据整合器前来取数据,等待数据整合器把获取的所有数据结果集取走。When the task of collecting the obtained data result set is completed, an execution completion notification will be sent. After receiving this notification, it will judge whether all the data acquisition tasks have been completed. If not, continue to wait. When all All the data collection tasks have been completed, notify the data integrator to fetch the data, and wait for the data integrator to take away all the obtained data result sets.
数据整合器3把获取的数据集,依据调度器传递的相关规则进行数据的整合处理,也就是把多个数据集合成一个数据集,然后把该数据集返回给调度器。如图4所示,数据整合器主要包括指令接收模块、指令分类模块、数据集接收模块和数据集处理模块,其中:The data integrator 3 integrates the acquired data sets according to the relevant rules passed by the scheduler, that is, gathers multiple data into one data set, and then returns the data set to the scheduler. As shown in Figure 4, the data integrator mainly includes an instruction receiving module, an instruction classification module, a data set receiving module and a data set processing module, wherein:
指令接收模块用于接收调度器的数据指令,加以存储;The instruction receiving module is used to receive the data instructions of the scheduler and store them;
指令分类模块用于将接收到的指令根据数据指令的功能进行分类,转换为对应的处理规则,包括数据过滤规则,数据连接规则和数据集格式;The instruction classification module is used to classify the received instructions according to the functions of the data instructions, and convert them into corresponding processing rules, including data filtering rules, data connection rules and data set formats;
数据集接收模块收到工作引擎的通知后用于接收所有数据结果集;The data set receiving module is used to receive all data result sets after receiving the notification from the working engine;
数据集处理模块用于对数据集进行过滤、连接和格式化,具体的,依次取出数据集,查找对应的数据过滤规则,对于该数据进行过滤处理,过滤后的结果形成一个临时数据集,当所有的数据集都完成过滤后,再根据连接规则,对过滤后的临时数据集进行连接处理,直到形成一个完整的数据集,根据定义的数据结构,把该数据集进行格式转换,完成后把数据集传递给调度器。The data set processing module is used to filter, connect and format the data sets. Specifically, take out the data sets in turn, find the corresponding data filtering rules, and filter the data. The filtered results form a temporary data set. When After all the data sets are filtered, according to the connection rules, the filtered temporary data sets are connected until a complete data set is formed. According to the defined data structure, the format of the data set is converted. After completion, the The dataset is passed to the scheduler.
本发明数据集中的方法,包括以下步骤:The method for data collection of the present invention comprises the following steps:
步骤A:接收并解析终端发送的数据源获取请求,生成数据访问相关的工作指令和数据整合相关的数据指令;Step A: Receive and analyze the data source acquisition request sent by the terminal, and generate work instructions related to data access and data instructions related to data integration;
步骤B:根据工作指令,创建并启动相应的任务,获取数据;Step B: According to the work order, create and start the corresponding task to obtain the data;
步骤C:根据数据指令,把获取的数据整合成统一的数据结果集;Step C: Integrate the acquired data into a unified data result set according to the data instruction;
步骤D:封装数据结果集为数据源并返回给终端用户。Step D: Encapsulate the data result set as a data source and return it to the end user.
具体地,本发明网络数据集中方法可采用本发明数据集中设备来实现,以下结合附图对本发明数据集中方法进行详细说明:Specifically, the network data concentration method of the present invention can be implemented by using the data concentration device of the present invention. The data concentration method of the present invention will be described in detail below in conjunction with the accompanying drawings:
如图5所示,本发明数据集中方法的总体流程如下:As shown in Figure 5, the overall flow of the data concentration method of the present invention is as follows:
步骤1:提供一个获取扩展数据源的接口,该接口监听终端发送的数据源获取请求。Step 1: Provide an interface for obtaining an extended data source, and this interface monitors the data source acquisition request sent by the terminal.
步骤2:通过该请求的信息,查找到对应的配置文件,把该文件读取内存中。Step 2: Find the corresponding configuration file through the requested information, and read the file into the memory.
步骤3:调用解析模块解析该文件的内容,得到数据访问的相关信息和数据整合的相关信息。Step 3: call the parsing module to parse the content of the file, and obtain relevant information about data access and data integration.
步骤4:根据数据文件的定义,把得到数据访问的相关信息和数据整合的相关信息转换为工作指令和数据指令,其中指令能够被下发模块识别、处理;Step 4: According to the definition of the data file, convert the related information of data access and data integration into work instructions and data instructions, where the instructions can be recognized and processed by the delivery module;
步骤5:把工作指令下发给工作引擎,数据指令下发给数据整合器;Step 5: Send the work instruction to the work engine, and send the data instruction to the data integrator;
步骤6:工作引擎收到工作指令,对收到的指令进行缓存,直到所有的指令接收完毕,工作引擎对指令进行分类处理,直到分类处理完成;Step 6: The work engine receives the work order, caches the received order until all the orders are received, and the work engine classifies the order until the classification process is completed;
步骤7:收到分类处理完成的通知,对于每一个分类建立一个任务,设定任务中获取数据的信息,当任务建立后启动该任务,把该任务的状态信息修改为执行状态,启动任务;Step 7: Receive the notification that the classification processing is completed, create a task for each classification, set the information of the data obtained in the task, start the task after the task is created, modify the status information of the task to the execution status, and start the task;
每一任务负责在一个位置和格式相同的数据获取,获取完成后返回数据集,并通知工作引擎执行完毕;Each task is responsible for acquiring data in one location and in the same format, returning the data set after the acquisition is completed, and notifying the working engine to complete the execution;
步骤8:监听任务处理完毕的通知,当收到一个任务处理完毕的通知时,把任务状态从执行修改为完成;Step 8: Listen for the notification that the task has been processed, and when receiving a notification that the task has been processed, change the status of the task from execution to completion;
步骤9:检查所有任务是否已经执行完毕,如果执行完毕就进入步骤10,否则转到步骤8。Step 9: Check whether all tasks have been executed, if executed, go to step 10, otherwise go to step 8.
依次创建并启动相应数据获取任务,当所有的数据获取任务都启动后,工作引擎进入等待状态;Create and start the corresponding data acquisition tasks in sequence. When all the data acquisition tasks are started, the working engine enters the waiting state;
步骤10:发送数据获取完成消息给数据整合器,数据整合器接收消息并从工作引擎获取数据集;Step 10: Send the data acquisition completion message to the data integrator, and the data integrator receives the message and obtains the data set from the working engine;
步骤11:根据步骤4中的接收到的数据指令对数据进行过滤和连接、格式化的整合;Step 11: Filter, connect and format the data according to the received data instruction in step 4;
具体地,数据集处理模块依次取出数据集,查找对应的数据过滤规则,对于该数据进行过滤处理,过滤后的结果形成一个临时数据集,当所有的数据集都完成过滤后,再根据连接规则,对过滤后的临时数据集进行连接处理,直到形成一个完整的数据集,根据定义的数据结构,把该数据集进行格式转换,完成后把数据集传递给调度器。Specifically, the data set processing module takes out the data sets in turn, searches for the corresponding data filtering rules, and performs filtering processing on the data. The filtered results form a temporary data set. When all the data sets are filtered, then according to the connection rules , connect the filtered temporary data sets until a complete data set is formed, convert the data set according to the defined data structure, and pass the data set to the scheduler after completion.
步骤12:数据整合完毕,发送消息给调度器,等待调度器把这个最终数据集取走;Step 12: After data integration is complete, send a message to the scheduler and wait for the scheduler to take the final data set away;
步骤13:调度器监听是否有数据集到达,当收到一个数据集已经形成的消息后,把这个数据集从数据整合器取回;Step 13: The scheduler monitors whether a data set arrives, and retrieves the data set from the data integrator after receiving a message that a data set has been formed;
步骤14:调度器把取回的数据集封装成数据源的形式,并把操作数据源的句柄发送给终端。Step 14: The scheduler encapsulates the retrieved data set into the form of a data source, and sends the handle to operate the data source to the terminal.
步骤15:终端获取到这个数据源操作句柄,获取数据。Step 15: The terminal obtains the data source operation handle and obtains data.
与现有技术相比较,本发明引入调度器、工作引擎和数据整合器的模块,这些模块相互配合完成网络上数据集中的功能。配置文件可以根据数据文件的扩展方式来访问新增格式的数据,同时配置文件定义返回数据的格式,使得结果数据格式可以定制。调度器能够获取不同格式的数据源,不需要对于该数据进行预先处理,就可以直接使用,这些数据可以存在于网络中的不同位置,文件格式也可以不相同,直接通过相关协议,如FTP或者数据库访问的相关资源就可以拿到这些数据,在数据整合器中,把工作引擎收集的数据,通过数据指令设定的规则合成对应的数据集,在这个过程中可以对数据进行连接和过滤,也可以把数据进行处理,对于数据进行映射转换,具备一定的数据处理和加工功能。Compared with the prior art, the present invention introduces the modules of scheduler, work engine and data integrator, and these modules cooperate with each other to complete the function of data concentration on the network. The configuration file can access the data in the newly added format according to the extension method of the data file, and at the same time, the configuration file defines the format of the returned data, so that the result data format can be customized. The scheduler can obtain data sources in different formats, and it can be used directly without pre-processing the data. These data can exist in different locations in the network, and the file formats can also be different, directly through related protocols, such as FTP or Relevant resources accessed by the database can obtain these data. In the data integrator, the data collected by the work engine is synthesized into corresponding data sets through the rules set by the data instructions. During this process, the data can be connected and filtered. Data can also be processed, and the data can be mapped and converted, with certain data processing and processing functions.
此外,当系统需要处理新增格式的数据类型时,只需要扩充数据文件的定义,增加调度器解析类的处理逻辑,就可以接入其他类型的数据,从而提高了系统的灵活性。In addition, when the system needs to process data types in newly added formats, it only needs to expand the definition of the data file and add the processing logic of the scheduler parsing class to access other types of data, thereby improving the flexibility of the system.
本发明设备和方法可以使得用户不用关心数据如何集成,减少了开发者的工作量,同时整个装置可以方便的集成到其它系统中,具有较强的灵活性和可扩展性。The device and method of the invention can make users not concerned about how to integrate data, reduce the workload of developers, and at the same time, the whole device can be easily integrated into other systems, and has strong flexibility and scalability.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008100041078A CN101216839B (en) | 2008-01-17 | 2008-01-17 | Network data centralization method and apparatus |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008100041078A CN101216839B (en) | 2008-01-17 | 2008-01-17 | Network data centralization method and apparatus |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101216839A true CN101216839A (en) | 2008-07-09 |
| CN101216839B CN101216839B (en) | 2011-09-21 |
Family
ID=39623271
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2008100041078A Expired - Fee Related CN101216839B (en) | 2008-01-17 | 2008-01-17 | Network data centralization method and apparatus |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101216839B (en) |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102045402A (en) * | 2010-12-28 | 2011-05-04 | 云海创想信息技术(北京)有限公司 | Data acquisition method and data acquisition system of distributed file system |
| WO2013007190A1 (en) * | 2011-07-11 | 2013-01-17 | Shao Kaiyi | Task execution system, data processing device, and task issuing device and method |
| CN102955799A (en) * | 2011-08-25 | 2013-03-06 | 广州银禾网络通信有限公司 | Method and system for structured storage of cells in mobile communication network signaling |
| CN105308558A (en) * | 2012-12-10 | 2016-02-03 | 维迪特克公司 | Rules based data processing system and method |
| CN105763363A (en) * | 2016-01-20 | 2016-07-13 | 曾戟 | Method and device for realizing user resource data configuration |
| CN105912674A (en) * | 2016-04-13 | 2016-08-31 | 精硕世纪科技(北京)有限公司 | Method, device and system for noise reduction and classification of data |
| CN102955799B (en) * | 2011-08-25 | 2016-12-14 | 广州银禾网络通信有限公司 | A kind of method and system that cell in mobile communications network signaling is carried out structured storage |
| CN110609815A (en) * | 2019-07-30 | 2019-12-24 | 河钢股份有限公司承德分公司 | Data file storage and sharing method and terminal equipment |
| CN111091473A (en) * | 2019-11-25 | 2020-05-01 | 泰康保险集团股份有限公司 | Insurance problem analysis and processing method and device |
| CN111143177A (en) * | 2019-12-04 | 2020-05-12 | 中国建设银行股份有限公司 | Method, system, device and storage medium for collecting RMF III data of IBM host |
| CN115309700A (en) * | 2022-07-26 | 2022-11-08 | 浪潮软件股份有限公司 | Universal architecture implementation method for heterogeneous file service |
-
2008
- 2008-01-17 CN CN2008100041078A patent/CN101216839B/en not_active Expired - Fee Related
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102045402A (en) * | 2010-12-28 | 2011-05-04 | 云海创想信息技术(北京)有限公司 | Data acquisition method and data acquisition system of distributed file system |
| WO2013007190A1 (en) * | 2011-07-11 | 2013-01-17 | Shao Kaiyi | Task execution system, data processing device, and task issuing device and method |
| CN102955799B (en) * | 2011-08-25 | 2016-12-14 | 广州银禾网络通信有限公司 | A kind of method and system that cell in mobile communications network signaling is carried out structured storage |
| CN102955799A (en) * | 2011-08-25 | 2013-03-06 | 广州银禾网络通信有限公司 | Method and system for structured storage of cells in mobile communication network signaling |
| CN105308558A (en) * | 2012-12-10 | 2016-02-03 | 维迪特克公司 | Rules based data processing system and method |
| CN105763363A (en) * | 2016-01-20 | 2016-07-13 | 曾戟 | Method and device for realizing user resource data configuration |
| CN105763363B (en) * | 2016-01-20 | 2019-05-28 | 曾戟 | A kind of method and apparatus of data configuration that realizing user resources |
| CN105912674A (en) * | 2016-04-13 | 2016-08-31 | 精硕世纪科技(北京)有限公司 | Method, device and system for noise reduction and classification of data |
| CN110609815A (en) * | 2019-07-30 | 2019-12-24 | 河钢股份有限公司承德分公司 | Data file storage and sharing method and terminal equipment |
| CN110609815B (en) * | 2019-07-30 | 2022-05-24 | 河钢股份有限公司承德分公司 | Data file storage and sharing method and terminal equipment |
| CN111091473A (en) * | 2019-11-25 | 2020-05-01 | 泰康保险集团股份有限公司 | Insurance problem analysis and processing method and device |
| CN111143177A (en) * | 2019-12-04 | 2020-05-12 | 中国建设银行股份有限公司 | Method, system, device and storage medium for collecting RMF III data of IBM host |
| CN111143177B (en) * | 2019-12-04 | 2023-08-11 | 中国建设银行股份有限公司 | Method, system, device and storage medium for collecting RMF III data of IBM host |
| CN115309700A (en) * | 2022-07-26 | 2022-11-08 | 浪潮软件股份有限公司 | Universal architecture implementation method for heterogeneous file service |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101216839B (en) | 2011-09-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101216839B (en) | Network data centralization method and apparatus | |
| CN107133267B (en) | Method and device for querying elastic search cluster, electronic equipment and readable storage medium | |
| CN111522922B (en) | Log information query method, device, storage medium and computer equipment | |
| KR101365832B1 (en) | Data access layer class generator | |
| US9092408B2 (en) | Data listeners for type dependency processing | |
| CN103235820B (en) | Date storage method and device in a kind of group system | |
| CN110955674B (en) | Asynchronous exporting method and component based on java service | |
| CN106790718A (en) | Service call link analysis method and system | |
| CN113760734A (en) | A data preparation method and device, device, and storage medium | |
| CN113010483A (en) | Mass log management method and system | |
| CN114297204A (en) | Data storage and retrieval method and device for heterogeneous data source | |
| CN111460021B (en) | Data export method and device | |
| CN101309178A (en) | A method and device for analyzing log information of an automatic switching optical network system | |
| CN101989939A (en) | Real-time data providing method, server and network | |
| US8856152B2 (en) | Apparatus and method for visualizing data | |
| CN111343269A (en) | A data downloading method, apparatus, computer equipment and storage medium | |
| CN102946423B (en) | Data mapping and pushing system and method based on distributed system architecture | |
| CN116450890A (en) | Graph data processing method, device and system, electronic equipment and storage medium | |
| CN116340269A (en) | Method for acquiring and searching Flink task logs in real time based on elastic search | |
| CN116628111A (en) | Data processing conversion method and device | |
| CN115757570A (en) | A log data analysis method, device, electronic equipment and medium | |
| CN114595363A (en) | Business log processing method, system, storage medium and terminal based on lightweight architecture | |
| KR102200010B1 (en) | Method and device for providing result of joint analysis between data sources of different types | |
| CN111625300B (en) | Efficient data acquisition loading method and system | |
| CN111367638A (en) | Processing method and computer equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110921 Termination date: 20180117 |
