CN115827786A - Distributed data acquisition system based on stream data processing - Google Patents
Distributed data acquisition system based on stream data processing Download PDFInfo
- Publication number
- CN115827786A CN115827786A CN202211726004.9A CN202211726004A CN115827786A CN 115827786 A CN115827786 A CN 115827786A CN 202211726004 A CN202211726004 A CN 202211726004A CN 115827786 A CN115827786 A CN 115827786A
- Authority
- CN
- China
- Prior art keywords
- data acquisition
- system based
- acquisition system
- module
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a distributed data acquisition system based on streaming data processing, which comprises a fund management platform, wherein a plurality of service subject libraries are arranged in the fund management platform, the fund management platform is connected with a scheduling device through a Nifi module, the scheduling device is provided with a service system, a financial database, an online bank interface and a software expansion module, the scheduling device is provided with a plurality of processing modules, the processing modules comprise a Jdbc data source processor, an Sftp file data source processor, a local file data source processor, a Rest interface processor, an API interface processor, a Kafka message processor, an MQTT message processor and other expansion processors, and the scheduling device is provided with an independent firewall module. Therefore, the investment of the it infrastructure of an enterprise can be reduced, and only 1 or more distributed data acquisition and distribution devices based on streaming data processing are deployed to complete the work of more than ten devices in the past.
Description
Technical Field
The invention relates to a data acquisition system, in particular to a distributed data acquisition system based on streaming data processing.
Background
The final goal of enterprise and public institution digital transformation is to enable enterprises to sense changes in real time, analyze changes in real time and make decisions in real time, and the automatic sensing and capturing of data naturally become the premise and the basis of digital transformation. The data collection and distribution process often requires several or even more devices to complete the relevant work.
Meanwhile, the new generation of fund management system requires more emphasis on visibility on the basis of original fund controllability, and business information carried by original fund flow is precipitated through data collection and statistical analysis. This places higher demands on both the data integration capability and the response speed of the capital system.
In addition, the mainstream data acquisition and distribution needs to be applied to ETL systems and devices, CDC-based data processing systems and devices, API-based data processing systems and devices, and data service bus-based data processing systems and devices. The process of integrating through the ETL system device mainly includes the steps of regularly summarizing a database table and a view, lack of processing capacity for real-time data and inflexible management of a C/S mode. The CDC and self-defined system and equipment mainly provide real-time data synchronization service, lack support for timing data processing of large data volume and have small and large data development amount; the API and the user-defined system and the equipment mainly realize the reading and the conversion of the API interface data at regular time, and have the advantages of large development work aiming at different interfaces, low access speed to different systems and high cost. Data access and distribution through a service bus system and devices is mainly suitable for interface type data management, which requires a large number of revisions for access and poor performance for large data and large files.
At present, the final goal of digital transformation is to enable enterprises to sense changes in real time, analyze changes in real time, and make decisions in real time. The process needs to collect a large amount of data of the peripheral system, the prior art lacks of unified technology and monitoring management, and needs a large amount of development, thereby greatly providing the development cost of the system. The flexibility is poor, the technical stack is complex, and the problems of difficult development and difficult management exist when designing a C/S architecture and a B/S architecture. Unified deployment and load balancing cannot be achieved for large data volume processing.
In view of the above-mentioned drawbacks, the present designer is actively making research and innovation to create a distributed data acquisition system based on streaming data processing, so that the distributed data acquisition system has higher industrial utilization value.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a distributed data acquisition system based on streaming data processing.
The invention relates to a distributed data acquisition system based on streaming data processing, which comprises a fund management platform, wherein: dispose a plurality of business theme storehouses in the fund management platform, the fund management platform is connected with scheduling device through the Nifi module, scheduling device disposes service system, financial database, online bank interface, software and expands the module, scheduling device is provided with a plurality of processing modules, processing module includes Jdbc data source treater, sftp file data source treater, local file data source treater, rest interface treater, API interface treater, kafka message processor, MQTT message processor, other extension treater, scheduling device disposes independent hot wall module.
Further, in the distributed data acquisition system based on streaming data processing, the service topic database can be independently loaded and updated.
Furthermore, in the distributed data acquisition system based on streaming data processing, the processing modules are configured with a process controller.
Further, the above-mentioned distributed data acquisition system based on streaming data processing, wherein the scheduling device is configured with a separate local storage device.
Furthermore, in the distributed data acquisition system based on streaming data processing, the service system is provided with a plurality of mutually independent service subject libraries, each service subject library has an extraction function, and the service system is configured with a preset file.
Furthermore, in the distributed data acquisition system based on streaming data processing, the financial database is provided with a Cdc support module.
Furthermore, in the distributed data acquisition system based on streaming data processing, the Kafka software and the rabbitmq software are preset in the software expansion module.
Still further, in the above-mentioned distributed data acquisition system based on streaming data processing, a communication feedback module and a repository module are configured in the scheduling device.
By the scheme, the invention at least has the following advantages:
1. the investment of the it infrastructure of an enterprise can be reduced, and only 1 or more distributed data acquisition and distribution devices based on streaming data processing need to be deployed to complete the work of more than ten devices in the past.
2. The data management response speed of the enterprise and public institution unit can be improved, and the offline and real-time data acquisition process meeting the requirements of different data source interfaces can be defined in a dragging mode through integrated graphical interface management.
3. Through stream data acquisition, the processing process of the data can be distributed to a plurality of equipment nodes for parallel processing, the high-efficiency requirement of mass data of enterprises and public institutions is met, and the data sensing, analyzing and deciding capability of the enterprises is improved.
4. The adopted equipment cluster can be flexibly expanded, and the continuously developed data load requirements of enterprises can be met.
5. The system can provide a uniform monitoring and management interface, so that managers can manage and maintain the system more easily, and the personnel investment is reduced.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
Fig. 1 is a schematic structural diagram of a distributed data acquisition system based on streaming data processing.
Fig. 2 is a schematic structural diagram of an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The distributed data acquisition system based on streaming data processing as shown in fig. 1 to 2 comprises a fund management platform, which is distinguished in that: and a plurality of service subject libraries are configured in the fund management platform. Simultaneously, the fund management platform is connected with scheduling device through the Nifi module. And the scheduling device is provided with a service system, a financial database, an online bank interface and a software expansion module. During implementation, the scheduling device is provided with a plurality of processing modules, wherein the processing modules comprise a Jdbc data source processor, a Sftp file data source processor, a local file data source processor, a Rest interface processor, an API interface processor, a Kafka message processor, an MQTT message processor and other extension processors. Furthermore, in order to improve the security of the communication process, the dispatching device is configured with an independent firewall module. During implementation, other expansion processors may be optionally configured as needed, and are not described herein.
In a preferred embodiment of the present invention, the service theme base can be loaded and updated independently. Therefore, the service theme base can be ensured to be updated in real time, and different data processing requirements can be met. Meanwhile, the adopted processing modules are all provided with process controllers. Therefore, coordination management and control of data processing can be achieved, and data transmission congestion cannot occur. And, the scheduling means is configured with an independent local storage means. Therefore, the data can be read locally without remote waiting, and the processing time is shortened.
Further, the adopted service system is provided with a plurality of mutually independent service theme libraries, the service theme libraries have an extraction function, and the service system is configured with a preset file. And, consider financial database steady operation needs, be equipped with the Cdc at financial database and support the module. During implementation, the Kafka software and the Rabbit MQ software are preset in the software expansion module. Meanwhile, the dispatching device is provided with a communication feedback module and a storage library module.
In view of practical implementation, as the fund management process needs various external data supports, wherein the types of data source interfaces are different, the scheme is based on the streaming data processing and task distributed processing characteristics of the NiFi. The support of the data acquisition system on a database, a data warehouse, an MQ, a CDC and a file type data source is enhanced, a capital data acquisition process is completed through dragging, connecting and configuring based on a WEB graphical interface, and functions of data acquisition, processing and the like are realized.
Meanwhile, for better implementation of the present invention, as shown in fig. 2, the database can be adapted to major databases such as jdbccmqsl, oracle, sql server, gauss, postgresql, greenplus, etc. by driving, and can flexibly extend other databases supporting jdbc driving. The CDC expands the support of CDC modes of main stream databases such as Oracle, sql server and the like except Myqsl; the file type data source is expanded to support text files supporting various types of separators and adapt and process the fixed-length text without separators except for the csv format. And collecting and supporting message middleware Kafka, message middleware conforming to MQTT standard and the like. The platform provides unified fund data acquisition management monitoring service based on task queues, task processing conditions, memory monitoring, load monitoring and the like. Therefore, automatic acquisition and monitoring of various data source data can be completed through one platform of the equipment under the condition of no code.
The working principle of the invention is as follows:
and the Web Server is utilized to realize command and control API based on HTTP, and a uniform management and monitoring interface is provided. By adopting the principle of a streaming controller, the device can execute a brain with specific operation, and is responsible for allocating executable threads to the processor from a thread resource pool and other tasks of resource management and scheduling. The device extension function is possessed, and the processor and other components can be extended. The stream file library can be realized, is responsible for storing the state of the stream file in the current active stream, and the function realization of the stream file library is pluggable. This function is implemented by a persistent write-ahead log (WAL) stored in a designated disk partition by default. The content library is configured to store the actual byte content of the stream file in the current active stream, and the function realization of the content library is pluggable. The default approach is a rather simple mechanism, i.e. storing the content data in a file system. Multiple storage paths may be specified so that different physical paths may be combined to avoid reaching the upper storage limit of a single physical partition. An event database may be built, responsible for keeping all trace event data, again this function is pluggable, and by default may be stored on one or more physical partitions, with the event data under each path indexed and queried.
The invention has the following advantages by the aid of the character expression and the accompanying drawings:
1. the investment of the it infrastructure of an enterprise can be reduced, and only 1 or more distributed data acquisition and distribution devices based on streaming data processing need to be deployed to complete the work of more than ten devices in the past.
2. The data management response speed of the enterprise and public institution unit can be improved, and the offline and real-time data acquisition process meeting the requirements of different data source interfaces can be defined in a dragging mode through integrated graphical interface management.
3. Through stream data acquisition, the processing process of the data can be distributed to a plurality of equipment nodes for parallel processing, the high-efficiency requirement of mass data of enterprises and public institutions is met, and the data sensing, analyzing and deciding capability of the enterprises is improved.
4. The adopted equipment cluster can be flexibly expanded, and the continuously developed data load requirements of enterprises can be met.
5. The system can provide a uniform monitoring and management interface, so that managers can manage and maintain the system more easily, and the personnel investment is reduced.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (8)
1. Distributed data acquisition system based on stream data processing, including fund management platform, its characterized in that: dispose a plurality of business theme storehouses in the fund management platform, the fund management platform is connected with scheduling device through the Nifi module, scheduling device disposes service system, financial database, online bank interface, software and expands the module, scheduling device is provided with a plurality of processing modules, processing module includes Jdbc data source treater, sftp file data source treater, local file data source treater, rest interface treater, API interface treater, kafka message processor, MQTT message processor, other extension treater, scheduling device disposes independent hot wall module.
2. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the service theme base can be independently loaded and updated.
3. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the processing modules are all provided with a process controller.
4. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the scheduling means is configured with independent local storage means.
5. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the business system is provided with a plurality of mutually independent business theme libraries, the business theme libraries have an extraction function, and the business system is configured with preset files.
6. The distributed data acquisition system based on streaming data processing of claim 1, wherein: financial database is equipped with the Cdc and supports the module.
7. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the software expansion module is preset with Kafka software and Rabbit MQ software.
8. The distributed data acquisition system based on streaming data processing of claim 1, wherein: the dispatching device is provided with a communication feedback module and a storage library module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211726004.9A CN115827786A (en) | 2022-12-30 | 2022-12-30 | Distributed data acquisition system based on stream data processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211726004.9A CN115827786A (en) | 2022-12-30 | 2022-12-30 | Distributed data acquisition system based on stream data processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115827786A true CN115827786A (en) | 2023-03-21 |
Family
ID=85519683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211726004.9A Pending CN115827786A (en) | 2022-12-30 | 2022-12-30 | Distributed data acquisition system based on stream data processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115827786A (en) |
-
2022
- 2022-12-30 CN CN202211726004.9A patent/CN115827786A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492040B (en) | System suitable for processing mass short message data in data center | |
CN114443435B (en) | A performance monitoring and alarming method and alarm system for container microservices | |
CN107734066A (en) | A kind of data center's total management system services administering method | |
CN105677836A (en) | Big data processing and solving system simultaneously supporting offline data and real-time online data | |
CN110472102A (en) | A kind of data processing method, device, equipment and storage medium | |
CN102937964B (en) | Intelligent data service method based on distributed system | |
CN103440290A (en) | Big data loading system and method | |
CN104699723A (en) | Data exchange adapter and system and method for synchronizing data among heterogeneous systems | |
CN108009258A (en) | It is a kind of can Configuration Online data collection and analysis platform | |
CN114090580A (en) | Data processing method, device, equipment, storage medium and product | |
CN116777182B (en) | Task dispatch method for semiconductor wafer manufacturing | |
CN110598051A (en) | Power industry monitoring system, method and device | |
US5093782A (en) | Real time event driven database management system | |
CN109977125A (en) | A kind of big data safety analysis plateform system based on network security | |
CN115292414A (en) | Method for synchronizing service data to data bins | |
CN114138612A (en) | Application monitoring system and method for multi-place multi-activity data center | |
CN117149873A (en) | Data lake service platform construction method based on flow batch integration | |
CN111708895B (en) | Knowledge graph system construction method and device | |
CN102929619A (en) | Process automation software development system across hardware platform | |
CN109829094A (en) | Distributed reptile system | |
CN106557492A (en) | A kind of method of data synchronization and device | |
CN117056303B (en) | Data storage method and device suitable for military operation big data | |
CN115827786A (en) | Distributed data acquisition system based on stream data processing | |
CN116069480B (en) | Processor and computing device | |
CN117076426A (en) | Traffic intelligent engine system construction method and device based on flow batch integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |