[go: up one dir, main page]

CN113760570B - Data processing method, device, electronic device, system and storage medium - Google Patents

Data processing method, device, electronic device, system and storage medium Download PDF

Info

Publication number
CN113760570B
CN113760570B CN202110018238.7A CN202110018238A CN113760570B CN 113760570 B CN113760570 B CN 113760570B CN 202110018238 A CN202110018238 A CN 202110018238A CN 113760570 B CN113760570 B CN 113760570B
Authority
CN
China
Prior art keywords
data
target
server
source
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110018238.7A
Other languages
Chinese (zh)
Other versions
CN113760570A (en
Inventor
李豆豆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110018238.7A priority Critical patent/CN113760570B/en
Publication of CN113760570A publication Critical patent/CN113760570A/en
Application granted granted Critical
Publication of CN113760570B publication Critical patent/CN113760570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/541Client-server
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例公开了一种数据处理方法、装置、电子设备、系统和存储介质,该数据处理方法包括:通过一级线程从源服务器获取源数据,并将源数据存入一级缓存;通过二级线程从一级缓存获取源数据,并对源数据进行反序列化处理得到目标数据,将目标数据存入二级缓存;通过三级线程从二级缓存获取目标数据,并将目标数据写入目标服务器。本发明实施例能够降低对技术人员的要求,降低运维成本。

The embodiment of the present invention discloses a data processing method, device, electronic device, system and storage medium. The data processing method comprises: obtaining source data from a source server through a primary thread and storing the source data in a primary cache; obtaining source data from a primary cache through a secondary thread, deserializing the source data to obtain target data, and storing the target data in a secondary cache; obtaining target data from a secondary cache through a third thread, and writing the target data to a target server. The embodiment of the present invention can reduce the requirements for technical personnel and reduce operation and maintenance costs.

Description

Data processing method, device, electronic equipment, system and storage medium
Technical Field
The present invention relates to data synchronization technology, and in particular, to a data processing method, apparatus, electronic device, system, and storage medium.
Background
Kafka is an open source stream processing platform, has the characteristics of high throughput, low delay, high performance and the like, is widely used by large Internet companies at present, and uses Kafka as a source of real-time data to synchronize the data to a server of the large Internet companies in real time. In the process of realizing the invention, the inventor finds that the current Kafka data synchronization scheme is mostly based on frames such as Flink, SPARK STREAMING, flume and the like which are calculated in real time, the admission threshold of the frames is high, the requirements on technicians are high, the frames depend on external components such as zookeeper, hadoop to provide services, and the operation and maintenance cost is high.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a device, electronic equipment, a system and a storage medium, which can reduce the requirements on technicians and reduce the operation and maintenance cost.
In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
Acquiring source data from a source server through a first-level thread, and storing the source data into a first-level cache;
acquiring the source data from the first-level cache through a second-level thread, performing deserialization processing on the source data to obtain target data, and storing the target data into the second-level cache;
and acquiring the target data from the secondary cache through a tertiary thread, and writing the target data into a target server.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, the apparatus including:
The first processing module is used for acquiring source data from a source server through a first-level thread and storing the source data into a first-level cache;
The second processing module is used for obtaining the source data from the first-level cache through a second-level thread, performing deserialization processing on the source data to obtain target data, and storing the target data into the second-level cache;
and the third processing module is used for acquiring the target data from the secondary cache through a tertiary thread and writing the target data into a target server.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a data processing method according to any one of the embodiments of the present invention when the processor executes the program.
In a fourth aspect, an embodiment of the present invention further provides a data processing system, including an origin server, a destination server, and an electronic device configured to execute the data processing method according to any one of the embodiments of the present invention.
In a fifth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method according to any of the embodiments of the present invention.
In the embodiment of the invention, the source data can be acquired from the source server through the first-level thread, the source data is stored in the first-level cache, the source data is acquired from the first-level cache through the second-level thread, the target data is obtained by performing deserialization processing on the source data, the target data is stored in the second-level cache, the target data is acquired from the second-level cache through the third-level thread, and the target data is written into the target server.
Drawings
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention.
Fig. 2 is another flow chart of a data processing method according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a thread framework according to an embodiment of the present invention.
Fig. 4 is a schematic flow chart of a data processing method according to an embodiment of the invention.
Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of a data processing system according to an embodiment of the present invention.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention, where the method may be performed by a data processing apparatus according to an embodiment of the present invention, and the apparatus may be implemented in software and/or hardware. In a specific embodiment, the apparatus may be integrated in a server. The following embodiments will be described taking the example that the apparatus is integrated in a server.
In a specific implementation, the server may be a synchronization server for synchronizing data of the source server to the target server, where the source server may be any one of a Kafka server, a target server may be a distributed file system (Hadoop Distributed FILE SYSTEM, HDFS) server, an HBase server, a Clickhouse server, a Hive server, and an elastic search server. The synchronous server can be configured with a Kubernetes (K8 s for short) environment, and K8s is a container scheduling engine of Google (Google) open source, and is mainly responsible for the operation of containers on a host machine and the scheduling and scheduling of the containers, and in particular, in the embodiment of the invention, a thread framework for implementing the data processing method of the embodiment of the invention can be deployed in the K8s environment in the form of a container to operate.
Referring to fig. 1, the data processing method provided in the embodiment of the present invention specifically includes the following steps:
Step 101, obtaining source data from a source server through a first-level thread, and storing the source data into a first-level cache.
For example, the source server may create a topic (topic) queue according to a data type and transmit the produced source data to a topic queue of a corresponding type, and the synchronization server may subscribe to the topic queue created by the source server, then acquire the source data from the subscribed topic queue through a primary thread and store the acquired source data in a primary cache.
In the embodiment of the invention, when the primary thread acquires the source data from the subscribed topic queue, the primary thread can firstly establish the source client, the source client can be used as a consumer of the data in the topic queue, and the source client is utilized to acquire the source data from the subscribed topic queue. For example, when the source server is a Kafka server, a Kafka client may be created by a primary thread, and source data may be acquired from a corresponding topic queue in the Kafka server using the Kafka client.
In particular, the acquired source data may be a batch of data expressed in the form of key-value pairs (key-value), which may be packaged piece by piece after the source data is acquired. In a specific embodiment, the first level cache may have one or more first level caches, when one first level cache is stored, all source data may be directly stored in the first level cache, when a plurality of first level caches are stored, the first level cache in which each source data should be stored may be determined according to the hash modulus value of each source data, and each source data may be stored in the corresponding first level cache. When a plurality of first-level caches are stored, hash modulo operation can be carried out on any piece of source data according to the quantity of the first-level caches to obtain a first modulus value, a first-level target cache is determined from the first-level caches according to the first modulus value, the source data is stored in the first-level target cache, and the like, a first-level target cache is determined for each piece of source data, and each piece of source data is stored in the corresponding first-level target cache. When hash modulo operation is performed on each piece of source data, hash modulo operation may be performed on a key (key) in each piece of source data.
Step 102, obtaining source data from the first-level cache through the second-level thread, performing deserialization processing on the source data to obtain target data, and storing the target data into the second-level cache.
For example, the number of the secondary threads may be one or more, and the number of the primary caches may be set according to the number of the secondary threads (i.e., the number of the upstream caches may be set according to the number of the downstream threads), for example, the number of the primary caches may be set to be the same as the number of the secondary threads. When there are multiple secondary threads and primary caches, a primary corresponding relationship may be preset, where the primary corresponding relationship includes a corresponding relationship between the secondary threads and the primary caches, for example, identification information (such as a thread name, a number, etc.) of each secondary thread may be bound with identification information (such as a cache address, a number, etc.) of the corresponding primary cache in the primary corresponding relationship.
Specifically, when the first-level cache and the second-level thread are only one, the second-level thread can be directly utilized to acquire the source data from the first-level cache, and when the first-level cache and the second-level thread are both multiple, the first-level cache corresponding to each second-level thread can be determined according to the first-level corresponding relation, and the source data is acquired from the corresponding first-level cache through each second-level thread.
After the source data is acquired, the source data acquired by the two-level thread can be subjected to deserialization processing, for example, the source data in a binary stream or text stream structure can be converted into a data structure which is easy to process and read by the two-level thread, and further, the data obtained after the deserialization processing can be subjected to processing such as cleaning, encryption and the like, so that target data is obtained.
After the target data is obtained, the target data can be packaged, and the packaged target data is stored into a secondary cache through a secondary thread. In a specific embodiment, the secondary caches may also have one or more secondary caches, when the secondary caches have one secondary cache, all target data may be directly stored in the secondary caches, when the secondary caches have a plurality of secondary caches, the secondary caches in which each item of target data should be stored may be determined according to the hash modulus value of each item of target data, and each item of target data is stored in the corresponding secondary cache. When a plurality of secondary caches exist, hash modulo operation can be carried out on the secondary target data according to the quantity of the secondary caches to obtain a second modulus value, the secondary target caches are determined from the secondary caches according to the second modulus value, the secondary target data are stored in the secondary target caches, and the like, a secondary target cache is determined for each piece of target data, and each piece of target data is stored in the corresponding secondary target cache. When hash modulo operation is performed on each item of target data, hash modulo operation may be performed on a key (key) in each item of target data.
And step 103, obtaining target data from the secondary cache through the tertiary thread, and writing the target data into the target server.
For example, the number of the secondary caches may be set according to the number of the tertiary threads, such as setting the number of the secondary caches to be the same as the number of the tertiary threads. When there are multiple tertiary threads and secondary caches, a secondary corresponding relation can be preset, and the secondary corresponding relation includes a corresponding relation between the tertiary threads and the secondary caches, for example, identification information (such as a thread name, a number and the like) of each tertiary thread can be bound with identification information (such as a cache address, a number and the like) of the corresponding secondary cache in the secondary corresponding relation.
Specifically, when the number of the secondary caches and the number of the tertiary threads are two, the target data can be obtained from the secondary caches by directly utilizing the tertiary threads, and when the number of the secondary caches and the tertiary threads is two, the secondary caches corresponding to the tertiary threads can be determined according to the secondary corresponding relation, and the target data can be obtained from the secondary caches corresponding to the tertiary threads through the tertiary threads.
In the embodiment of the present invention, the primary cache and the secondary cache may be common caches, such as a ring cache (Ringbuff), a blocking queue (Array Blocking Queue) cache, and the like, which are not limited herein.
After the target data is acquired, the target data may be written to the target server using three levels of threads. In a specific implementation, when target data is written into a target server by using three-level threads, a target client can be created by the three-level threads, the target client can serve as a consumer of the target data, and the target client is used for writing the target data into the target server. For example, when the target server is an HBase server, an HBase client may be created by three levels of threads, and target data may be written into the HBase server by using the HBase client. For example, when the target server is an HDFS server, the HDFS client may be created by three levels of threads, with the HDFS client being utilized to write target data to the HDFS server.
In a specific implementation, when the target client is used to write target data into the target server, a storage format supported by the target server can be determined first, and the target client is used to write the target data into the target server in batches according to the storage format. For example, when the target server is an HDFS server, the storage format may be SequenceFile, mapFile format, when the target server is a Hive server, the storage format may be TextFile format, when the target server is a Clickhouse server, the storage format may be a tuple format, or the like.
According to the technical scheme, the source data can be acquired from the source server through the first-level thread, the source data is stored in the first-level cache, the source data is acquired from the first-level cache through the second-level thread, the target data is obtained through deserialization processing of the source data, the target data is stored in the second-level cache, the target data is acquired from the second-level cache through the third-level thread, and the target data is written into the target server.
In the embodiment of the invention, the primary thread is used for interfacing with the source server, so that only one primary thread can be provided, a plurality of secondary threads and three-level threads can be provided, and the data can be processed in a parallel mode by the plurality of secondary threads and three-level threads, thereby improving the data processing efficiency.
The data processing method provided by the embodiment of the invention is described below by taking a plurality of secondary threads and tertiary threads as examples, and as shown in fig. 2, the method comprises the following steps:
In step 201, a source client is created by a primary thread, and source data is acquired from a source server by using the source client.
Specifically, the source client may be used as a consumer of data in the topic queue in the source server to obtain the source data from the subscribed topic queue. For example, when the source server is a Kafka server, a Kafka client may be created by a primary thread, and source data may be acquired from a corresponding topic queue in the Kafka server using the Kafka client.
Step 202, performing hash modulo operation on source data according to the number of first-level caches to obtain a first modulus value, determining a first-level target cache from the first-level caches according to the first modulus value, and storing the source data into the first-level target cache
Specifically, the acquired source data may be a batch of data expressed in a key-value pair (key-value), after acquiring the source data, the source data may be packaged one by one, and then a level cache in which each piece of source data should be stored is determined according to a hash modulus value of each piece of source data, and each piece of source data is stored in the corresponding level cache. The method comprises the steps of carrying out hash modular operation on any piece of source data according to the quantity of first-level caches to obtain a first modular value, determining a first-level target cache from the first-level caches according to the first modular value, storing the piece of source data into the first-level target cache, and the like, determining a first-level target cache for each piece of source data, and storing each piece of source data into the corresponding first-level target cache. When hash modulo operation is performed on each piece of source data, hash modulo operation may be performed on a key (key) in each piece of source data.
Step 203, determining a first-level buffer corresponding to the second-level thread according to the first-level correspondence, acquiring source data from the first-level buffer corresponding to the second-level thread through the second-level thread, and performing deserialization processing on the source data to obtain target data.
The first-level corresponding relation can be preset, and the first-level corresponding relation comprises a corresponding relation between the second-level threads and the first-level cache, for example, the identification information (such as a thread name, a number and the like) of each second-level thread can be bound with the identification information (such as a cache address, a number and the like) of the corresponding first-level cache in the first-level corresponding relation. The first-level cache corresponding to each second-level thread can be determined according to the first-level corresponding relation, and the source data is acquired from the first-level cache corresponding to each second-level thread.
After the source data is acquired, the source data acquired by the two-level thread can be subjected to deserialization processing, for example, the source data in a binary stream or text stream structure can be converted into a data structure which is easy to process and read by the two-level thread, and further, the data obtained after the deserialization processing can be subjected to processing such as cleaning, encryption and the like, so that target data is obtained.
And 204, performing hash modulo operation on the target data according to the number of the secondary caches to obtain a second modulus value, determining the secondary target caches from the secondary caches according to the second modulus value, and storing the target data into the secondary target caches.
After the target data is obtained, the target data can be packaged, and then hash modulo operation is performed on each item of target data after the packaging, so that a secondary cache in which each item of target data should be stored is determined. For any item of target data, hash modulo operation can be carried out on the item of target data according to the number of the secondary caches to obtain a second modulus value, the secondary target cache is determined from the secondary caches according to the second modulus value, the item of target data is stored in the secondary target cache, and the like, a secondary target cache is determined for each item of target data, and each item of target data is stored in the corresponding secondary target cache. When hash modulo operation is performed on each item of target data, hash modulo operation may be performed on a key (key) in each item of target data.
Step 205, determining a secondary cache corresponding to the tertiary thread according to the secondary correspondence, and acquiring target data from the secondary cache corresponding to the tertiary thread through the tertiary thread.
The second-level correspondence may be preset, where the second-level correspondence includes a correspondence between the third-level threads and the second-level cache, for example, in the second-level correspondence, identification information (such as a thread name, a number, etc.) of each third-level thread may be bound to identification information (such as a cache address, a number, etc.) of the corresponding second-level cache. The second-level buffer memory corresponding to each third-level thread can be determined according to the second-level corresponding relation, and target data is acquired from the second-level buffer memory corresponding to the third-level thread through each third-level thread.
In step 206, the storage format supported by the target server is determined.
For example, when the target server is an HDFS server, the storage format may be SequenceFile, mapFile format, when the target server is a Hive server, the storage format may be TextFile format, when the target server is a Clickhouse server, the storage format may be a tuple format, or the like.
In step 207, the target data is written into the target server according to the storage format through the three-level thread.
Specifically, when the target data is written into the target server by using the three-level thread, the target client can be created by the three-level thread, and can be used as a consumer of the target data, and the target client is used for writing the target data into the target server. For example, when the target server is an HBase server, an HBase client may be created by three levels of threads, and target data may be written into the HBase server by using the HBase client. For example, when the target server is an HDFS server, the HDFS client may be created by three levels of threads, with the HDFS client being utilized to write target data to the HDFS server.
In a specific embodiment, the primary thread may be implemented using a source operator, the secondary thread may be implemented using a parameter operator, and the tertiary thread may be implemented using a sink operator.
In a specific embodiment, for example, three secondary threads and three tertiary threads are provided, the thread framework of the embodiment of the present invention may be shown in fig. 3, where a primary thread is configured to obtain source data from a source server, determine, by calculation, a primary cache in which each piece of source data should be stored, and then store the source data in the determined primary cache, each secondary thread is configured to determine, according to a primary correspondence, a corresponding primary cache in which each piece of source data should be stored, obtain source data from the corresponding primary cache, and perform deserialization processing on the obtained source data to obtain target data, determine, by calculation, a secondary cache in which each piece of target data should be stored, then store the target data in the determined secondary cache, and each tertiary thread is configured to determine, according to a secondary correspondence, determine, from the corresponding secondary cache, the target data, and write the target data into the target server.
In the thread framework shown in fig. 3, only a plurality of secondary threads and tertiary threads are used, and the number of secondary threads and the number of tertiary threads are the same as each other for illustration, and in a specific implementation, the number of secondary threads and tertiary threads may be different, which is not limited herein.
In the embodiment of the invention, the data in the source server can be synchronized to the target server by calling each thread, so that the running environment of the thread frame formed by each thread is only required to be configured (for example, the thread frame formed by each thread is configured in a K8s environment to run in a container form), the requirements on technicians are reduced, the service is not provided by an external component, the operation and maintenance cost is reduced, a plurality of secondary threads and tertiary threads can be arranged, the data can be processed in a parallel mode by the plurality of secondary threads and tertiary threads, and the data processing efficiency is improved.
In a specific embodiment, the repeated consumption and the data loss of the data can be avoided by recording the consumption position of the data, and the specific implementation method can be shown in fig. 4, and includes the following steps:
In step 301, a data consumption record of a source client is created by a primary thread, and the data consumption record is sent to a secondary thread.
For example, after the primary thread creates the source client, a snapshot of the consumption location of the source client for the data in the subscription topic queue may be periodically created, and the consumption location snapshot is encapsulated into a data consumption record, and the data consumption record is sent to a downstream secondary thread through the primary thread in a broadcast manner.
In step 302, the data consumption record is sent to a target client created by the tertiary thread through the secondary thread.
Each secondary thread, after receiving the data consumption record, may send the data consumption record to a downstream tertiary thread in the form of a broadcast.
And step 303, writing the target data into the target server in batches according to the data consumption record by using the target client.
In a specific implementation, after each tertiary thread acquires the target data, the data consumption record of the broadcast can be waited, and after the data consumption record is received, the target client can be utilized to write the target data into the target server in batches according to the data consumption record. For example, the current consumption location may be determined according to the data consumption record, and target data before the current consumption location may be batch written to the target server by the target client.
And step 304, sending the data consumption record to the source server through the four-level thread.
Specifically, after batch writing of target data prior to the current consumption location to the target server with the target client, the data consumption record may be sent to the source server by a four-level thread. Therefore, after the thread framework on the synchronous server is restarted, when the source data is acquired from the source server again, the source server can provide the data for the synchronous server according to the data consumption record, so that repeated consumption and data loss of the data are avoided, and the writing efficiency is improved through a writing mode of batch writing.
By means of the data processing method provided by the thread framework of the embodiment of the invention, the data in the Kafka server (server cluster) can be synchronized to the HDFS server, the HBase server, the Clickhouse server, the Hive server, the elastic search server and the like in real time. Taking the example of synchronizing data to an HDFS server in real time, on the premise that the number of the partitions of the Kafka theme is 1 and the specification of container resources is 4 cores 12G, the real-time synchronization of data with the size of 12GB can be completed within 1 minute, and the data synchronization performance can be transversely expanded according to the number of the partitions of the Kafka theme, namely the size of the real-time synchronization data per minute=12GB of the partitions of the theme.
Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention, which is adapted to perform the data processing method according to the embodiment of the present invention. As shown in fig. 5, the apparatus may specifically include:
A first processing module 501, configured to obtain source data from a source server through a first level thread, and store the source data into a first level cache;
The second processing module 502 is configured to obtain the source data from the first-level cache through a second-level thread, perform deserialization processing on the source data to obtain target data, and store the target data into the second-level cache;
And a third processing module 503, configured to obtain the target data from the second level cache through a third level thread, and write the target data into a target server.
In one embodiment, the first processing module 501 obtains source data from a source server through a primary thread, including creating a source client through the primary thread, and obtaining the source data from the source server by using the source client;
The third processing module 503 writes the target data to a target server through a tertiary thread, including creating a target client through the tertiary thread, and writing the target data to the target server using the target client.
In one embodiment, the first processing module 501 stores the source data in a first level cache, including performing a hash modulo operation on the source data according to the number of the first level caches to obtain a first modulus value, determining a first level target cache from the first level cache according to the first modulus value, and storing the source data in the first level target cache;
The second processing module 502 stores the target data in a second level buffer, including performing hash modulo operation on the target data according to the number of the second level buffers to obtain a second modulus value, determining a second level target buffer from the second level buffers according to the second modulus value, and storing the target data in the second level target buffer.
In one embodiment, the second processing module 502 obtains the source data from the first level cache through a second level thread, including determining a first level cache corresponding to the second level thread according to a first level correspondence, and obtaining the source data from the first level cache corresponding to the second level thread through the second level thread;
The third processing module 503 obtains the target data from the second level buffer memory through a third level thread, including determining the second level buffer memory corresponding to the third level thread according to the second level correspondence, and obtaining the target data from the second level buffer memory corresponding to the third level thread through the third level thread.
In an embodiment, the first processing module 501 is further configured to create, by using the primary thread, a data consumption record of the source client, and send the data consumption record to the target client created by using the tertiary thread through the secondary thread;
The third processing module 503 writes the target data to the target server using the target client, including:
And writing the target data into the target server in batches according to the data consumption record by using the target client.
In one embodiment, the apparatus further comprises:
And the fourth processing module is used for sending the data consumption record to the source server through a fourth-level thread.
In one embodiment, the third processing module 503 writes the target data to the target server through three levels of threads, including:
determining a storage format supported by the target server;
And writing the target data into the target server according to the storage format through the tertiary thread.
In an embodiment, the source server includes a Kafka server, and the target server includes any one of a distributed file system HDFS server, an HBase server, a Clickhouse server, a Hive server, and an elastic search server.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working process of the functional module described above may refer to the corresponding process in the foregoing method embodiment, and will not be described herein.
According to the device provided by the embodiment of the invention, the source data can be acquired from the source server through the first-level thread, the source data is stored in the first-level cache, the source data is acquired from the first-level cache through the second-level thread, the target data is obtained through deserialization processing of the source data, the target data is stored in the second-level cache, the target data is acquired from the second-level cache through the third-level thread, and the target data is written into the target server.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the data processing method provided by any embodiment when executing the program.
The embodiment of the invention also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the data processing method provided in any of the above embodiments.
FIG. 6 illustrates an exemplary system architecture to which a data processing method or data processing apparatus of an embodiment of the present invention may be applied.
As shown in fig. 6, the system architecture may include an electronic device 601, an origin server 602, and a target server 603, where the origin server 602 may be a Kafka server, and the target server 603 may be any one of an HDFS server, an HBase server, clickhouse server, a Hive server, and an elastic search server. The electronic device 601 may be a synchronization server for synchronizing data of the source server 602 to the target server 603, where the electronic device 602 may configure a K8s environment, and in particular, in an embodiment of the present invention, a thread framework for implementing a data processing method of an embodiment of the present invention may be deployed in a container to run in the K8s environment.
It should be noted that, the data processing method provided in the embodiment of the present invention is generally executed by the electronic device 601, and accordingly, the data processing apparatus is generally disposed in the electronic device 601.
Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Connected to the I/O interface 705 are an input section 706 including a keyboard, a mouse, and the like, an output section 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 708 including a hard disk, and the like, and a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units involved in the embodiments of the present invention may be implemented in software, or may be implemented in hardware. The described modules and/or units may also be provided in a processor, for example, as a processor comprising a first processing module, a second processing module and a third processing module. The names of these modules do not constitute a limitation on the module itself in some cases.
As a further aspect, the invention also provides a computer readable medium which may be comprised in the device described in the above embodiments or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include obtaining source data from a source server by a primary thread and storing the source data in a primary cache, obtaining the source data from the primary cache by a secondary thread and performing deserialization processing on the source data to obtain target data, storing the target data in a secondary cache, obtaining the target data from the secondary cache by a tertiary thread, and writing the target data to a target server.
According to the technical scheme of the embodiment of the invention, the source data can be acquired from the source server through the first-level thread, the source data is stored in the first-level cache, the source data is acquired from the first-level cache through the second-level thread, the target data is obtained through deserialization processing of the source data, the target data is stored in the second-level cache, the target data is acquired from the second-level cache through the third-level thread, and the target data is written into the target server.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A data processing method, applied to a synchronization server, comprising:
creating a source client through a primary thread, acquiring source data subscribed by the primary thread from a theme queue created by a source server by using the source client, storing the source data into a primary cache, and creating a data consumption record of the source client through the primary thread, wherein one primary thread exists;
the source data are obtained from the primary cache in parallel through a plurality of secondary threads, and are subjected to deserialization processing to obtain target data, and the target data are stored in the secondary cache;
The target data is obtained from the secondary cache in parallel through a plurality of tertiary threads and written into a target server, wherein the writing of the target data into the target server comprises the steps of creating a target client through the tertiary threads, sending the data consumption record to the target client created by the tertiary threads through the secondary threads;
and sending the data consumption record to the source server through a four-level thread.
2. A data processing method according to claim 1, wherein,
The step of storing the source data into a first-level cache comprises the steps of carrying out hash modular operation on the source data according to the quantity of the first-level caches to obtain a first modular value, determining a first-level target cache from the first-level caches according to the first modular value, and storing the source data into the first-level target cache;
The storing of the target data into the secondary cache comprises the steps of carrying out hash modular operation on the target data according to the number of the secondary caches to obtain a second modular value, determining the secondary target cache from the secondary caches according to the second modular value, and storing the target data into the secondary target cache.
3. A data processing method according to claim 1, wherein,
The method for acquiring the source data from the primary cache through a plurality of secondary threads in parallel comprises the steps of determining the primary cache corresponding to the secondary threads according to a primary corresponding relation, and acquiring the source data from the primary cache corresponding to the secondary threads through the secondary threads;
the step of obtaining the target data from the secondary cache through a plurality of tertiary threads in parallel comprises the steps of determining the secondary cache corresponding to the tertiary threads according to the secondary corresponding relation, and obtaining the target data from the secondary cache corresponding to the tertiary threads through the tertiary threads.
4. The data processing method according to claim 1, wherein the writing of the target data to the target server by three-level threading comprises:
determining a storage format supported by the target server;
And writing the target data into the target server according to the storage format through the tertiary thread.
5. The data processing method according to claim 1, wherein the source server comprises a Kafka server, and the target server comprises any one of a distributed file system HDFS server, an HBase server, a Clickhouse server, a Hive server, and an elastic search server.
6. A data processing apparatus for use with a synchronization server, comprising:
The first processing module is used for creating a source client through a primary thread, acquiring source data subscribed by the primary thread from a theme queue created by a source server by using the source client, storing the source data into a primary cache, and creating a data consumption record of the source client through the primary thread, wherein one primary thread is provided;
The second processing module is used for obtaining the source data from the first-level cache in parallel through a plurality of second-level threads, performing deserialization on the source data to obtain target data, and storing the target data into the second-level cache;
the third processing module is used for obtaining the target data from the secondary cache in parallel through a plurality of tertiary threads and writing the target data into a target server, wherein the writing the target data into the target server comprises creating a target client through the tertiary threads and sending the data consumption record to the target client created by the tertiary threads through the secondary threads;
And the fourth processing module is used for sending the data consumption record to the source server through a fourth-level thread.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the data processing method according to any one of claims 1 to 5 when executing the program.
8. A data processing system comprising an origin server, a destination server and an electronic device for performing the data processing method of any of claims 1 to 5.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the data processing method according to any one of claims 1 to 5.
CN202110018238.7A 2021-01-07 2021-01-07 Data processing method, device, electronic device, system and storage medium Active CN113760570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110018238.7A CN113760570B (en) 2021-01-07 2021-01-07 Data processing method, device, electronic device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110018238.7A CN113760570B (en) 2021-01-07 2021-01-07 Data processing method, device, electronic device, system and storage medium

Publications (2)

Publication Number Publication Date
CN113760570A CN113760570A (en) 2021-12-07
CN113760570B true CN113760570B (en) 2025-02-21

Family

ID=78786282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110018238.7A Active CN113760570B (en) 2021-01-07 2021-01-07 Data processing method, device, electronic device, system and storage medium

Country Status (1)

Country Link
CN (1) CN113760570B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118503148B (en) * 2024-07-17 2024-12-13 季华实验室 Multimodal data management system, method, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492018A (en) * 2018-09-12 2019-03-19 武汉达梦数据库有限公司 A kind of adaptive dynamic adjusting method of data synchronous system and device
CN110287023A (en) * 2019-06-11 2019-09-27 广州海格通信集团股份有限公司 Message treatment method, device, computer equipment and readable storage medium storing program for executing

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8433870B2 (en) * 2010-10-12 2013-04-30 International Business Machines Corporation Multiple incremental virtual copies
US9218305B2 (en) * 2011-11-30 2015-12-22 International Business Machines Corporation Reader-writer synchronization with high-performance readers and low-latency writers
CN103023809B (en) * 2012-12-28 2015-07-22 中国船舶重工集团公司第七0九研究所 Information system synchronous data processing method utilizing secondary buffer technology
US10282208B2 (en) * 2017-07-14 2019-05-07 International Business Machines Corporation Cognitive thread management in a multi-threading application server environment
CN107590277B (en) * 2017-09-28 2021-06-25 泰康保险集团股份有限公司 Data synchronization method, device, electronic device and storage medium
CN109947668B (en) * 2017-12-21 2021-06-29 北京京东尚科信息技术有限公司 Method and apparatus for storing data
CN109783255B (en) * 2019-01-07 2021-02-23 中国银行股份有限公司 Data analysis and distribution device and high-concurrency data processing method
CN111813805A (en) * 2019-04-12 2020-10-23 中国移动通信集团河南有限公司 A data processing method and device
CN110147398B (en) * 2019-04-25 2020-05-15 北京字节跳动网络技术有限公司 Data processing method, device, medium and electronic equipment
CN111367925A (en) * 2020-02-27 2020-07-03 深圳壹账通智能科技有限公司 Data dynamic real-time updating method, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492018A (en) * 2018-09-12 2019-03-19 武汉达梦数据库有限公司 A kind of adaptive dynamic adjusting method of data synchronous system and device
CN110287023A (en) * 2019-06-11 2019-09-27 广州海格通信集团股份有限公司 Message treatment method, device, computer equipment and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN113760570A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
US11029866B2 (en) Methods, devices, and computer program products for processing data
US10838873B2 (en) Method, apparatus, and computer program product for managing addresses in distributed system
CN110908788B (en) Data processing method, device, computer equipment and storage medium based on Spark Streaming
CN105677469B (en) Timed task execution method and device
US20180300110A1 (en) Preserving dynamic trace purity
US11201836B2 (en) Method and device for managing stateful application on server
US10402223B1 (en) Scheduling hardware resources for offloading functions in a heterogeneous computing system
CN103401934A (en) Method and system for acquiring log data
CN109508326B (en) Method, device and system for processing data
CN110019087A (en) Data processing method and its system
CN111880948A (en) Data refreshing method and device, electronic equipment and computer readable storage medium
CN109614241A (en) The method and system of more cluster multi-tenant resource isolations are realized based on Yarn queue
US11481130B2 (en) Method, electronic device and computer program product for processing operation commands
CN113760570B (en) Data processing method, device, electronic device, system and storage medium
US10783003B2 (en) Method, device, and computer readable medium for managing dedicated processing resources
CN113934767B (en) A data processing method and device, computer equipment and storage medium
CN116009985A (en) Interface calling method, device, computer equipment and storage medium
WO2023019712A1 (en) Zlib compression algorithm-based cloud computing resource manager communication delay optimization method
US10764354B1 (en) Transmitting data over a network in representational state transfer (REST) applications
CN115225586B (en) Data packet transmitting method, device, equipment and computer readable storage medium
CN115378937B (en) Distributed concurrency method, device, equipment and readable storage medium for tasks
WO2025055184A1 (en) Data processing method, system and apparatus, and medium
CN117931439A (en) Map reduction task processing method, device, related equipment and program product
CN117950850A (en) Data transmission method, device, electronic equipment and computer readable medium
CN111625524B (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant