CN115756549A - Method and device for downloading data of big data middlebox and storage medium - Google Patents
Method and device for downloading data of big data middlebox and storage medium Download PDFInfo
- Publication number
- CN115756549A CN115756549A CN202211509504.7A CN202211509504A CN115756549A CN 115756549 A CN115756549 A CN 115756549A CN 202211509504 A CN202211509504 A CN 202211509504A CN 115756549 A CN115756549 A CN 115756549A
- Authority
- CN
- China
- Prior art keywords
- data
- downloading
- task
- file
- big
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of big data middleware data downloading, and particularly relates to a method and a device for downloading big data middleware data and a storage medium. The method comprises the following steps: s1, data source management, namely, the data source management is used for increasing, deleting, modifying and checking data sources and uniformly managing the data sources through data source configuration items; s2, developing a data downloading program, wherein the data downloading program is developed based on a big data engine; s3, submitting a data downloading task, namely finishing the task submission of a data downloading program based on a big data engine; s4, monitoring a data downloading task, namely monitoring a data downloading program task based on a big data task monitoring component; and S5, downloading the data file, wherein in the data downloading task, the result data is downloaded from the data file temporary storage system directory to the client. The purpose is as follows: the same management of the data source can be realized, the downloading of mass data is not influenced by the version and the data source, and the data downloading efficiency is improved.
Description
Technical Field
The invention belongs to the technical field of big data middleware data downloading, and particularly relates to a method and a device for downloading big data middleware data and a storage medium.
Background
With the rapid increase of the data volume and the exponential increase of the data volume, four problems of difficult understanding, difficult acquisition, difficult processing, difficult organization and the like of the data are necessarily caused in the face of mass data. Big data processing technology relates to data management, data governance, data organization, data service and the like. The data downloading service is an important part of functions of data management and data service of the big data middling station, provides the big data middling station data downloading service according to data requirements, and can complete data exchange and data downloading with other systems and platforms.
The data center has the function of downloading mass data of various data sources, and when the data center faces mass data and is downloaded in a multi-task parallel mode, the common downloading method cannot meet the service scene. In addition, generally, the data downloading function is tightly coupled with a service scene, the data downloading function of various data sources needs to be integrated according to different service requirements, and the data downloading function has the problems of repeated development, repeated dependence and the like.
Disclosure of Invention
The purpose of the invention is: the method, the device and the storage medium for downloading the large data middlebox data can realize the same management of data sources, enable the downloading of mass data not to be influenced by versions and the data sources and improve the data downloading efficiency.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the present application provides a method for downloading big data middlebox data, including the following steps:
s1, data source management, namely, the data source management is used for increasing, deleting, modifying and checking data sources and uniformly managing the data sources through data source configuration items;
s2, developing a data downloading program, wherein the data downloading program is developed based on a big data engine;
s3, submitting a data downloading task, namely finishing the task submission of a data downloading program based on a big data engine;
s4, monitoring a data downloading task, namely monitoring a data downloading program task based on a big data task monitoring component;
and S5, downloading the data file, and downloading the result data from the temporary storage system directory of the data file to the client in the data downloading task.
With reference to the first aspect, in some optional embodiments, in the step S1, the data source includes, but is not limited to, a database including, but not limited to, mysql, oracle, hbase, hive, and es, a data file including, but not limited to, hdfs, ftp, and S3, an interface including, but not limited to, rest api, websockets, and a message bus including, but not limited to, kafka, rabbitmq, pulsar.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
when the configuration items of the data sources are passed, the data sources are uniformly managed by configuring connection addresses, user names, passwords or authentication key files, and the method can be used for adding, deleting, modifying and checking the data sources.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
developing a downloading program for different data sources according to different design engines, acquiring necessary parameters for data source connection, necessary parameters for data temporary storage file system connection and data downloading requirement information by the downloading program according to task execution rules, uniformly writing the downloading information into a temporary data file system after data downloading is executed, acquiring a downloading result data file, and connecting configuration items.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
the download result data file comprises a data folder and a json description file, wherein the data folder is a compressed file of one or more data files, and the json description file comprises but is not limited to a structured data field name, a data type, a data field description, a data task id, the number of data file compressed files and a data task serial number;
the json description file also comprises data file suffixes of the compressed files of the data files in the data folder, data file compression formats, data file names, data file sizes and the number of data sets in the data files.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
in the process of finishing the task submission of the data downloading program, calling a big data engine to execute the submission; or invoking a third party service to perform the submission;
and calling a big data engine or a third-party service according to the data source configuration information, the data downloading temporary storage path, the file system configuration information and the data downloading rule transmitted in the data downloading program.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
in the process of monitoring the data downloading task, the big data cluster task resource management component is used, a task resource component interface is called, and the execution resource condition, the execution state, the execution result and the execution log of the data downloading task are monitored.
In some alternative embodiments, in combination with the first aspect, the method further comprises,
when the submitted downloading task is executed, modifying task related information in a json description file in the downloading result data file, and downloading the modified task related information and the data file to the client;
and when the data downloading task is not executed, starting a data downloading program, and after the monitoring task is executed, downloading data from the data file temporary storage system directory to the client.
In a second aspect, the present application further discloses a device for downloading big data middlebox data, where the device includes:
the data source module is used for realizing the functions of increasing, deleting, modifying and checking and the unified management of the data source through the data source configuration items;
the execution module is used for developing a data downloading program according to the big data engine and finishing the task submission of the data downloading program;
the monitoring module is used for monitoring the data downloading program task according to the big data task monitoring component;
and the downloading module is used for downloading the result data of the data file from the data file temporary storage system directory to the client.
In a third aspect, the present application also discloses a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform the method as described above.
The invention adopting the technical scheme has the following advantages:
1. data downloading of multiple data sources can be unified, unified data source management in a big data center platform is realized, data source management, addition, deletion, modification and check are realized, the use of the data sources is clear, and convenience and quickness are realized; the method comprises the steps of developing a big data engine-based download program aiming at different data sources, executing the download program by a unified task scheduling service, monitoring the download program by a unified task monitoring platform, realizing unified management and increasing development iteration efficiency;
2. the data downloading program based on the big data engine has no bottleneck of downloading data size, can download and process massive data, and improves the downloading efficiency;
3. after the data downloading program task is executed, the downloaded data file can be written into various data file systems and temporarily stored, so that the same data downloading task does not need to repeatedly execute the data downloading program, and the data of the executed downloading task can be downloaded from the data file systems, thereby being more convenient;
4. unified download task management, based on the download task of the big data engine, unified task management can be realized by the big data cluster task scheduler, unified monitoring can be performed on task execution resources, execution progress, execution state and the like, and the task state can be monitored only by calling a data cluster task manager interface in data download service;
5. the download task based on the big data engine can flexibly develop a task execution program, flexibly expand, develop different data download programs for different data sources, develop different versions of data download programs for different versions of data sources, and have no association and no mutual influence among the data download programs of the data sources of various versions.
Drawings
The present application can be further illustrated by the non-limiting examples given in the figures. It is to be understood that the following drawings illustrate only certain embodiments of this application and are therefore not to be considered limiting of scope, for those skilled in the art to which they pertain further figures may be derived without inventive faculty;
fig. 1 is a first flowchart illustrating a download design method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart illustrating a second download design method according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a downloading apparatus according to an embodiment of the present application;
the main component symbols are as follows:
the device comprises a downloading device 200, a data source module 210, an execution module 220, a monitoring module 230 and a downloading module 240.
Detailed Description
The present application will be described in detail with reference to the drawings and specific embodiments, wherein like reference numerals are used for similar or identical parts in the drawings or description, and implementations not shown or described in the drawings are known to those of ordinary skill in the art. In the description of the present application, the terms "first," "second," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, an embodiment of the present application provides a method for downloading big data middlebox data, including the following steps:
and 150, downloading the data file, wherein in the data downloading task, the result data is downloaded from the data file temporary storage system directory to the client.
Based on the design, data downloading of multiple data sources is unified, data source management in a big data center platform is unified, data source management is increased, deleted, changed and checked, and the data sources are clearly used, conveniently and quickly; and the development is based on a big data engine, different data source downloading programs are executed by a unified task scheduling service and monitored by a unified task monitoring platform, so that the bottleneck of the size of the downloaded data volume can be avoided, massive data can be downloaded and processed, the downloading efficiency is improved, unified management is realized, and the development iteration efficiency is increased.
As an alternative embodiment, in step 110, the data source includes but is not limited to a database, a data file, an interface and a message bus, wherein the database includes but is not limited to mysql, oracle, hive, hbase and es; the data files include but are not limited to hdfs, ftp and s3; the interfaces include but are not limited to rest api, web sockets; the message bus comprises but is not limited to kafka, rabbitmq and pulsar.
It can be understood that various different types of data sources are integrated and unified, different downloads are facilitated to be serviced, different downloading programs do not need to be developed according to different data sources, and unified management of resources is achieved.
As an alternative embodiment, the method may further comprise,
when the configuration items of the data sources are passed, the data sources are uniformly managed by configuring connection addresses, user names, passwords or authentication key files, and the data sources can be used for adding, deleting, modifying and checking the data sources.
It is understood that when the data source is hive, a url connection address, a user name, a password or a hive kerberos authentication key file is configured; when the hive data source service has no drive, the hive jdbc drive is configured, and when the hive data source service has a drive, the hive jdbc drive is not configured.
When the data source is Mysql, configuring a url connection address, a user name and a password, and when the Mysql data source service has no drive, configuring a Mysql jdbc drive; when there is a drive in the Mysql data source service, then no drive is configured.
And when the data source is Hbase, configuring a zookeeper connection address, a user name, a password or a Hbase kerberos authentication key file.
It can be understood that the above data source configuration is only an exemplary embodiment, and different connection addresses, user names, passwords or authentication key files are configured according to different data sources, so that the data sources are modified and checked, and thus different data can be downloaded by logging in the data sources when downloading services.
It is understood that the data sources may also be unified through configuration files, which may be authentication key files, authenticated ticket files, or other authentication files commonly used in the art.
As an alternative embodiment, the method may further comprise,
in step 120, a downloading program is developed for different data sources according to different design engines, the downloading program obtains necessary parameters for data source connection, necessary parameters for data temporary storage file system connection and data downloading requirement information according to task execution rules, after data downloading is executed, the downloading information is uniformly written into a temporary data file system to obtain a downloading result data file, and configuration items or configuration files are connected.
In this embodiment, the design engines include, but are not limited to, spark, flink, mr, and may correspondingly develop a data downloading program according to different design engines, so that the developed data downloading program can download data in data sources such as a database, a data file, an interface, and a message bus.
In the present embodiment, the developed data downloading program, when acquiring necessary parameters according to the task execution rule, adopts the following rule,
and acquiring the configuration items of the data sources according to the corresponding data sources, thereby acquiring the data from the data sources. When the data source is a socket, a connection address, a user name, a password or an authentication key file of the socket data source is acquired, and data is acquired from the socket data source. And when the data source is the sink data storage system, acquiring a connection address, a user name, a password or an authentication key file of the sink data storage system, and writing the data to be downloaded into the data storage system so as to acquire the desired data.
It can be understood that, in different data sources, the data content to be downloaded and the basic information described by the control data are controlled according to the developed data downloading program, and the developed data downloading program can also be downloaded off-line.
As an alternative embodiment, the method may further comprise,
in step 120, the download result data file includes a data folder and a json description file, where the data folder is a compressed file of one or more data files, and the json description file includes, but is not limited to, a name of a structured data field, a data type, a description of a data field, a data task id, a number of compressed data file, and a serial number of data task;
the json description file also comprises data file suffixes of the compressed files of the data files in the data folder, data file compression formats, data file names, data file sizes and the number of data sets in the data files.
It can be understood that, when developing the task execution rule of the downloading program, the method for compressing the downloading data, the size of each data file and the like are controlled, the basic information described in the json description file is controlled, and the name of the data field, the data type, the description of the data field, the task id and the like are obtained from the data downloading rule, so that the downloading task can be accurately executed.
As an alternative embodiment, the method may further comprise,
in step 130, in the process of completing the task submission of the data downloading program, a big data engine is called to perform submission; or invoking a third party service to perform the submission;
and calling a big data engine or a third-party service according to the data source configuration information, the data downloading temporary storage path, the file system configuration information and the data downloading rule transmitted in the data downloading program.
In this embodiment, when the flink engine is adopted, the data downloading program may be based on a client submission mode, and when the spark engine is adopted, the data downloading program may be based on a third-party service livy submission, and the like.
It is understood that the data source configuration information and the file system configuration information are connection addresses, user names, passwords, authentication key files, and the like of databases such as databases, data files, interfaces, and message buses. The data downloading rule is a task execution rule of the downloading program.
As an alternative embodiment, the method may further comprise,
in step 140, in the data downloading task monitoring process, the big data cluster task resource management component is used, and the task resource component interface is called to monitor the execution resource condition, the execution state, the execution result and the execution log of the data downloading task.
In this implementation, the big data cluster task resource management component includes, but is not limited to, yanr, which is exemplified by Yanr, native Yanr has an API interface providing query RESTful, an interface is called in a download service, and an incoming task id is queried at regular time through http to obtain a task progress. Specifically, the interface is debugged with the task id every second, the first query returns 10%, and the second query returns 20% until the data downloading task is completed.
As an alternative embodiment, the method may further comprise,
in step 150, when the submitted downloading task is executed, modifying task related information in a json description file in the downloading result data file, and downloading the modified task related information and the data file to the client;
and when the data downloading task is not executed, starting a data downloading program, and downloading data from the data file temporary storage system directory to the client after the monitoring task is executed.
In this embodiment, when the data has been downloaded once, and the data files downloaded twice or more are the same, the description information in the json description file is different, and the information such as the task id, the running account number, the user name of the downloading user, and the downloading task time is updated. The system automatically follows up according to the task rule, the Json description file follows up again, and the generated data task is not required to be executed repeatedly.
It can be understood that when data is downloaded and needs to be downloaded, the data does not need to be downloaded again from the data source, and only the data needs to be queried under the directory of the data file temporary storage system and downloaded directly from the data file temporary storage system.
Referring to fig. 2, the following describes a method for downloading the large data middlebox data, as follows:
s1, managing data sources, namely uniformly managing different data sources by setting configuration items such as connection addresses, user names, passwords or authentication key files of data sources such as a database, data files, interfaces, a message bus and the like, and increasing, deleting, modifying and checking contents in the data sources through the configuration items;
s2, developing a data downloading program based on a big data engine based on any one of spark, flink, mr and the like, developing downloading programs for different data sources, and downloading data in the different data sources; when developing a data downloading program, developing a task execution rule of the data downloading program, namely controlling the data content to be downloaded and basic information described by control data according to the developed data downloading program; the data downloading program acquires necessary parameters for connecting a data source, necessary parameters for connecting a data temporary storage file system and data downloading demand information from the task execution rule, and after the data downloading program is executed, files are uniformly written into the temporary data file system to obtain a downloading result data file and are connected with configuration items, wherein the necessary parameters include but are not limited to a connection address, a user name, a password or an authentication key file; the download result data file comprises a data folder and a json description file, wherein the data folder is a compressed file of one or more data files, and the json description file comprises but is not limited to a structured data field name, a data type, a data field description, a data task id, the number of the data file compressed files and a data task serial number; the json description file also comprises a data file suffix, a data file compression format, a data file name, a data file size and the number of data sets in the data file of each data file compression file in the data folder;
s3, submitting a data downloading task, and calling a big data engine or a third-party service to execute according to data source configuration information, a data downloading temporary storage path, file system configuration information and a data downloading rule which are transmitted in a data downloading program;
s4, monitoring a data downloading task, namely monitoring the data downloading task submitted in the data cluster by using a big data cluster task resource management component, calling a task resource component interface in the data downloading service, and monitoring the execution resource condition, the execution state, the execution result, the execution log and other monitoring items of the data downloading task;
and S5, data downloading service, wherein the result data is downloaded from the data file temporary storage system directory to the client in the data downloading service. If the submitted downloading task is executed, modifying task related information in a json description file in the data result folder, and downloading the modified task related information and the data file to the client; and if the data downloading task is not executed, starting a data downloading program, and downloading data from the data file temporary storage system directory to the client after the monitoring task is executed.
In the above embodiment, different data sources are managed in a unified manner by configuring configuration items, and then data downloading programs of the data sources are developed by different design engines, so that the data downloading programs can download data in different data sources, and then the data source configuration information, the data downloading temporary storage path, the file system configuration information, and the data downloading rules imported by the data downloading programs enable a big data engine or a third-party service to execute the downloading programs, so that the data are downloaded to a client, thereby realizing that the downloading of mass data is not affected by versions and data sources, and improving the data downloading efficiency.
Referring to fig. 3, the present embodiment further provides a device for downloading big data, where the downloading device 200 includes at least one software functional module that can be stored in a memory module in the form of software or Firmware (Firmware) or solidified in an Operating System (OS). Such as software functional modules and computer programs included in the downloading apparatus 200.
The downloading device 200 may include a data source module 210, an execution module 220, a monitoring module 230, and a downloading module 240, and each unit may have the following functions;
the data source module 210 is configured to implement an add-delete-modify-check function and unified management of data sources through data source configuration items;
the execution module 220 is used for developing a data downloading program according to the big data engine and finishing task submission of the data downloading program;
the monitoring module 230 is used for monitoring the data downloading program task according to the big data task monitoring component;
and a downloading module 240, configured to download the result data from the data file temporary storage system directory to the client.
Through the data source module 210, the configuration items are set, different data sources are managed in a unified manner, the data downloading program is stored and developed through the execution module 220, the data downloading task can be submitted to the data source module 210, the position and description of corresponding data can be found in the data source according to the data downloading task, the data is downloaded to the client through the downloading module 240, the data is downloaded rapidly, in the data downloading process, the data downloading progress and the completion condition are monitored through the monitoring module, and the downloading progress is displayed in real time.
Optionally, the data source module 210 may also be used to,
configuration items including but not limited to connection addresses, user names, passwords or authentication key files can be configured according to data sources such as databases, data files, interfaces, message buses and the like, configuration files can be added, unified management of different data sources is achieved, and data contents in different data sources can be added, deleted, changed and checked.
Optionally, the execution module 220 may be further configured to,
developing a data downloading program based on different big data engines and a task execution rule of the data downloading program, acquiring necessary parameters for data source connection, necessary parameters for data temporary storage file system connection and data downloading requirement information according to the task execution rule, uniformly writing the downloading information into a temporary data file system after data downloading is executed, acquiring a downloading result data file, and connecting configuration items.
Optionally, the execution module 220 may be further configured to,
the method comprises the steps of containing a data folder and a json description file, wherein the data folder is a compressed file of one or more data files, and the json description file comprises but is not limited to a structured data field name, a data type, a data field description, a data task id, the number of compressed data file numbers and a data task serial number; the json description file also comprises data file suffixes, data file compression formats, data file names, data file sizes and data set numbers of the data files of the data file compression files in the data folder.
Optionally, the download module 240 may also be configured to,
when the submitted downloading task is executed, modifying task related information in a json description file in the downloading result data file, and downloading the modified task related information and the data file to the client;
and when the data downloading task is not executed, starting a data downloading program, and downloading data from the data file temporary storage system directory to the client after the monitoring task is executed.
In this embodiment, the storage module may be, but is not limited to, a random access memory, a read only memory, a programmable read only memory, an erasable programmable read only memory, an electrically erasable programmable read only memory, and the like. In this embodiment, the storage module may be used to store the data content in the data source module 210, the operating status of the execution module 220, the monitoring module 230, and the downloading module 240, and so on. Of course, the storage module may also be used to store a program, and the processing module executes the program after receiving the execution instruction.
The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to execute the download designing method as described in the above embodiments.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by hardware, or by software plus a necessary general hardware platform, and based on such understanding, the technical solution of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions to enable a computer device (which can be a personal computer, a braking device, or a network device, etc.) to execute the method described in the embodiments of the present application.
In summary, the embodiments of the present application provide a method and an apparatus for downloading big data middlebox data, and a storage medium. In the scheme, the data downloading of multiple data sources is unified through configuration items, the unified data source management in a big data center station is realized, the data source management is increased, deleted, improved and checked, and the data source use is clear, convenient and quick; and the development is based on a big data engine, different data source downloading programs are executed by a unified task scheduling service and monitored by a unified task monitoring platform, so that the bottleneck of downloading data volume can be avoided, massive data can be downloaded and processed, the downloading efficiency is improved, unified management is realized, and the development iteration efficiency is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A method for downloading big data middlebox data is characterized in that: the method comprises the following steps:
s1, data source management, namely, the data source management is used for increasing, deleting, modifying and checking data sources and uniformly managing the data sources through data source configuration items;
s2, developing a data downloading program, wherein the data downloading program is developed based on a big data engine;
s3, submitting a data downloading task, namely finishing the task submission of a data downloading program based on a big data engine;
s4, monitoring a data downloading task, namely monitoring a data downloading program task based on a big data task monitoring component;
and S5, downloading the data file, and downloading the result data from the temporary storage system directory of the data file to the client in the data downloading task.
2. The method for downloading big data middlebox data according to claim 1, wherein: in the step S1, the data source includes, but is not limited to, a database including, but not limited to, mysql, oracle, hbase, hive and es, a data file including, but not limited to, hdfs, ftp and S3, an interface including, but not limited to, rest api, web sockets, and a message bus including, but not limited to, kafka, rabbitmq, pulsar.
3. The method for downloading big data middlebox data according to claim 2, wherein: the method further comprises the step of enabling the user to select the target,
when the configuration items of the data sources are passed, the data sources are uniformly managed by configuring connection addresses, user names, passwords or authentication key files, and the data sources can be used for adding, deleting, modifying and checking the data sources.
4. The method for downloading big data middlebox data according to claim 1, wherein: the method may further comprise the step of,
and developing a downloading program for different data sources according to different design engines, acquiring necessary parameters for data source connection, necessary parameters for data temporary storage file system connection and data downloading requirement information by the downloading program according to task execution rules, uniformly writing the downloading information into a temporary data file system after data downloading is executed, acquiring a downloading result data file, and connecting configuration items.
5. The method for downloading big data middlebox data according to claim 4, wherein: the method further comprises the step of enabling the user to select the target,
the download result data files comprise data folders and json description files, the data folders are compressed files of one or more data files, and the json description files comprise but are not limited to structured data field names, data types, data field descriptions, data task ids, data file compressed file numbers and data task serial numbers;
the json description file also comprises data file suffixes of the compressed files of the data files in the data folder, data file compression formats, data file names, data file sizes and the number of data sets in the data files.
6. The method for downloading big data middlebox data according to claim 4, wherein: the method may further comprise the step of,
in the process of finishing the task submission of the data downloading program, calling a big data engine to execute the submission; or invoking a third party service to perform submission;
and calling a big data engine or a third-party service according to the data source configuration information, the data downloading temporary storage path, the file system configuration information and the data downloading rule transmitted in the data downloading program.
7. The method for downloading big data middlebox data according to claim 1, wherein: the method further comprises the step of enabling the user to select the target,
in the process of monitoring the data downloading task, the big data cluster task resource management component is used, a task resource component interface is called, and the execution resource condition, the execution state, the execution result and the execution log of the data downloading task are monitored.
8. The method for downloading big data midget data according to claim 5, wherein: the method further comprises the step of enabling the user to select the target,
when the submitted downloading task is executed, modifying task related information in a json description file in the downloading result data file, and downloading the modified task related information and the data file to the client;
and when the data downloading task is not executed, starting a data downloading program, and after the monitoring task is executed, downloading data from the data file temporary storage system directory to the client.
9. An apparatus for downloading big data, the apparatus comprising:
the data source module is used for realizing the functions of increasing, deleting, modifying and checking and the unified management of the data source through the data source configuration items or configuration files;
the execution module is used for developing a data downloading program according to the big data engine and finishing task submission of the data downloading program;
the monitoring module is used for monitoring the data downloading program task according to the big data task monitoring component;
and the downloading module is used for downloading the result data of the data file from the temporary data file storage system directory to the client.
10. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211509504.7A CN115756549A (en) | 2022-11-29 | 2022-11-29 | Method and device for downloading data of big data middlebox and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211509504.7A CN115756549A (en) | 2022-11-29 | 2022-11-29 | Method and device for downloading data of big data middlebox and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115756549A true CN115756549A (en) | 2023-03-07 |
Family
ID=85339971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211509504.7A Pending CN115756549A (en) | 2022-11-29 | 2022-11-29 | Method and device for downloading data of big data middlebox and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115756549A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118012918A (en) * | 2024-01-26 | 2024-05-10 | 北京朗维科技有限公司 | GNSS-R auxiliary data management system |
-
2022
- 2022-11-29 CN CN202211509504.7A patent/CN115756549A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118012918A (en) * | 2024-01-26 | 2024-05-10 | 北京朗维科技有限公司 | GNSS-R auxiliary data management system |
CN118012918B (en) * | 2024-01-26 | 2024-08-06 | 北京朗维科技有限公司 | GNSS-R auxiliary data management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2016264496C1 (en) | Custom communication channels for application deployment | |
US20200142690A1 (en) | Shared Software Libraries for Computing Devices | |
CN110413595B (en) | Data migration method applied to distributed database and related device | |
US11080041B1 (en) | Operating system management for virtual workspaces | |
CN106843978B (en) | SDK access method and system | |
US20170034697A1 (en) | Pen Needle Outer Cover Concepts | |
US11762738B2 (en) | Reducing bandwidth during synthetic restores from a deduplication file system | |
US11405328B2 (en) | Providing on-demand production of graph-based relationships in a cloud computing environment | |
CN111344997B (en) | Reconnecting cryptographic key management system service instances | |
CN115756549A (en) | Method and device for downloading data of big data middlebox and storage medium | |
CN112650710B (en) | Data migration sending method and device, storage medium and electronic device | |
JP6418419B2 (en) | Method and apparatus for hard disk to execute application code | |
CN117527785B (en) | Method and system for supporting space engineering file data uploading and full link management | |
CN111431951B (en) | Data processing method, node equipment, system and storage medium | |
Vernik et al. | Stocator: Providing high performance and fault tolerance for apache spark over object storage | |
CN113806309B (en) | Metadata deleting method, system, terminal and storage medium based on distributed lock | |
US10162626B2 (en) | Ordered cache tiering for program build files | |
CN111125149B (en) | Hive-based data acquisition method, hive-based data acquisition device and storage medium | |
CN113010377A (en) | Method and device for collecting operation logs of operation | |
CN112559460A (en) | File storage method, device, equipment and storage medium based on artificial intelligence | |
US11416460B2 (en) | Source-agnostic service for performing deduplication for an object storage | |
US20180068003A1 (en) | Updating a local instance of a shared drive | |
US11379440B1 (en) | Correction, synchronization, and migration of databases | |
CN118193296A (en) | Data snapshot method, device, equipment and storage medium | |
CN114189512A (en) | Baseline code downloading method and device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |