CN119961502A - Data processing method, device, equipment, storage medium and program product - Google Patents
Data processing method, device, equipment, storage medium and program product Download PDFInfo
- Publication number
- CN119961502A CN119961502A CN202411820869.0A CN202411820869A CN119961502A CN 119961502 A CN119961502 A CN 119961502A CN 202411820869 A CN202411820869 A CN 202411820869A CN 119961502 A CN119961502 A CN 119961502A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- service
- search engine
- configuration information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a data processing method, a device, equipment, a storage medium and a program product, wherein the method comprises the steps of reading data table configuration information when a service is started, and caching the data table configuration information to a file cache area of a target search engine; the method comprises the steps of responding to a change request corresponding to target data, storing the target data into a target database, generating a change log corresponding to the target data, pulling the change log in the target database through a data channel, converting the target data into business data which can be identified by a target search engine according to the change log and data table configuration information, writing the business data into a memory of the target search engine, responding to a query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data. The application can reduce data delay, improve data processing efficiency and achieve the effect of real-time data display.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, storage medium, and program product.
Background
With the advent of the big data age, the demand for data real-time in enterprise-level applications has increased. Especially when data needs to be synchronized from a relational database (such as MySQL) to a full-text search engine (such as an elastomer search, ES), data latency tends to occur, thereby affecting the user experience.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment, a storage medium and a program product, which can reduce data delay, improve data processing efficiency and achieve the effect of real-time data display.
In one aspect, an embodiment of the present application provides a data processing method, including:
Reading data table configuration information when a service is started, and caching the data table configuration information to a file cache region of a target search engine;
Responding to a change request corresponding to target data, storing the target data into a target database, and generating a change log corresponding to the target data, wherein the change log is used for recording change operation of the target data in the target database;
Pulling the change log in the target database through a data channel, and converting the target data into service data identifiable by a target search engine according to the change log and the data table configuration information;
Writing the service data into the memory of the target search engine;
And responding to the query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data.
In another aspect, an embodiment of the present application provides a data processing apparatus, including:
The first processing unit is used for reading the data table configuration information when the service is started and caching the data table configuration information to a file cache area of the target search engine;
The change unit is used for responding to a change request corresponding to target data, storing the target data into a target database, and generating a change log corresponding to the target data, wherein the change log is used for recording change operation of the target data in the target database;
The second processing unit is used for pulling the change log in the target database through a data channel and converting the target data into service data which can be identified by a target search engine according to the change log and the data table configuration information;
The writing unit is used for writing the service data into the memory of the target search engine;
And the query unit is used for responding to a query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data.
In another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the data processing method according to any one of the embodiments above by calling the computer program stored in the memory.
In another aspect, embodiments of the present application provide a computer readable storage medium storing a computer program adapted to be loaded by a processor to perform a data processing method according to any of the embodiments above.
In another aspect, an embodiment of the present application is a computer program product comprising computer instructions which, when executed by a processor, implement a data processing method as described in any of the embodiments above.
The embodiment of the application reads the configuration information of the data table and caches the configuration information of the data table in a file cache area of a target search engine when the service is started, responds to a change request corresponding to the target data, stores the target data in a target database, generates a change log corresponding to the target data, is used for recording the change operation of the target data in the target database, pulls the change log in the target database through a data channel, converts the target data into service data identifiable by the target search engine according to the change log and the configuration information of the data table, writes the service data in a memory of the target search engine, responds to a query request, refreshes the service data in the memory of the target search engine to the file cache area of the target search engine, and displays a data service list containing the service data. The embodiment of the application remarkably reduces the delay time from changing to displaying of the data by reading and caching the configuration information of the data table when the service is started and forcedly refreshing the data in the target search engine when the query request is invoked, can accelerate the data writing and query process through a concurrent processing mechanism, improves the overall efficiency of data processing, and greatly improves the user experience by reducing the data delay so that the user can almost see the data change in real time on a front-end interface. The embodiment of the application can reduce data delay, improve data processing efficiency and achieve the effect of real-time data display.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a data processing method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a first application scenario provided in an embodiment of the present application.
Fig. 3 is a schematic diagram of a second application scenario provided in an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides a data processing method, a device, equipment, a storage medium and a program product. Specifically, the data processing method of the embodiment of the present application may be performed by a computer device, where the computer device may be a terminal or a server. The terminal can be smart phones, tablet computers, notebook computers, desktop computers, smart televisions, smart speakers, wearable smart devices, smart vehicle-mounted terminals and other devices, and also can comprise a client, wherein the client can be a financial client, a browser client or an instant messaging client and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content distribution network services, basic cloud computing services such as big data and an artificial intelligence platform, but is not limited thereto.
ELASTICSEARCH (ES for short) is a very powerful open source search engine, has very powerful functions, and can help users to quickly find needed contents from mass data.
ES data delay-after data insertion into ES, the data just inserted cannot be found immediately because the ES data is not refreshed (i.e., updated) to the file system after insertion, but is refreshed at a certain period.
Most of the systems for storing data and displaying the data in the table have certain data delay, and the data stored by the user cannot be seen on the page, so that the experience of the user is very bad, and the problem of data delay needs to be solved.
The embodiment of the application remarkably reduces the delay time from changing to displaying of the data by reading and caching the configuration information of the data table when the service is started and forcedly refreshing the data in the target search engine when the query request is invoked, can accelerate the data writing and query process through a concurrent processing mechanism, improves the overall efficiency of data processing, and greatly improves the user experience by reducing the data delay so that the user can almost see the data change in real time on a front-end interface. The embodiment of the application can reduce data delay, improve data processing efficiency and achieve the effect of real-time data display.
According to the embodiment of the application, the data delay is optimized from about 3s to about 200ms by analyzing the full life cycle of the data, finding out different stages causing the data delay and performing targeted optimization, so that the performance is improved by about 94%.
The following will describe in detail. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.
Referring to fig. 1 to 3, fig. 1 is a flowchart of a data processing method according to an embodiment of the present application, and fig. 2 and 3 are application scenario diagrams according to an embodiment of the present application. The method comprises the following steps:
step 110, reading the configuration information of the data table when the service is started, and caching the configuration information of the data table in a file cache area of the target search engine.
For example, when a service (such as a data synchronization service, a data search service, or a data processing service) is started, data table configuration information stored in a configuration file, database, or other persistent store is first read. The data table configuration information generally comprises service association configuration tables associated with different data, the service association configuration tables are used for recording association relations between service configuration information corresponding to the service tables and source configuration information corresponding to the source tables, the service configuration information is used for defining association relations between the service tables and the main tables, and the source configuration information is used for defining association relations between the source tables and the main tables in the database. The service then caches this configuration information in a file cache of the target search engine (e.g., an elastomer search). The purpose of this step is to reduce the subsequent frequent access to configuration information and to increase the efficiency of data processing. Through the caching mechanism, when the search engine needs to process the query request, the configuration information of the data table can be directly read from the cache without being acquired from the database or the external configuration file every time, so that the query delay is remarkably reduced.
For example, the service table is a table designed according to service requirements and is used for storing service data after processing, cleaning and conversion, such as an order table, a user table and the like.
For example, a master table refers to a location of a target data store, such as a table in a MySQL database for storing target data.
For example, a source table refers to a data table that needs to be synchronized to a target search engine (e.g., an elastomer search, ES). The data changes of the source table are captured and synchronized into the target search engine to implement the real-time search and data analysis functions.
For example, there is typically a one-to-one association of a source table with a master table in a database. This means that each source table corresponds to a master table, ensuring the accuracy and consistency of the data synchronization.
For example, a business table is a processed table that is one-to-one associated with a list that is actually displayed on the graphical user interface. This means that the data seen by the user on the interface is directly derived from the results of the processing of the business table.
For example, the service association configuration table is used for recording the association relationship between the service configuration information corresponding to the service table and the source configuration information corresponding to the source table. In general, one service configuration information may correspond to a plurality of source configuration information, which reflects that a service table may need to be constructed by integrating data of a plurality of source tables.
For example, there is one business table (employee information table) and two source tables (employee table and work record table). For example, business data in the business table includes business fields such as employee name, employee phone number, employer company 1-date, employer company 2-date, etc., the employee name and employee phone number in the employee table are synchronized to corresponding business fields in the business table, and the employer company 1-date and employer company 2-date in the job record table are synchronized to corresponding business fields in the business table. The service association configuration table records this association.
In some embodiments, the reading the data table configuration information at the service start-up and caching the data table configuration information in a file cache area of the target search engine includes:
reading data table configuration information when the service is started;
initializing the data channel;
and caching the data table configuration information to a file cache area of the target search engine.
Wherein when a service is started, first the source location storing the data table configuration information is accessed. The data table configuration information may be stored in a configuration file (e.g., a file in JSON, YAML, etc.), a database (e.g., a relational database such as MySQL, postgreSQL), or other persistent storage. The data table configuration information generally comprises service association configuration tables of different data, the service association configuration tables are used for recording association relations between service configuration information corresponding to the service tables and source configuration information corresponding to the source tables, the service configuration information is used for defining association relations between the service tables and the main tables, and the source configuration information is used for defining association relations between the source tables and the main tables in the database. The system will read the data table configuration information and perform the necessary parsing and verification to ensure its integrity and correctness.
After the configuration information of the data table is read and parsed, the data channel (e.g. Canal, debezium) for data synchronization or data exchange is initialized. The purpose of initializing the data channel is to establish a stable and reliable data transmission link, ensuring that data changes can be captured in real-time or near real-time from a source database or other data source later.
And then, caching the parsed and verified data table configuration information into a file cache region of a target search engine (such as an elastic search). The file cache area is an important part of the search engine for storing index files and inquiring cache, and the inquiring efficiency can be remarkably improved by caching the configuration information of the data table of the file cache area into the area. For example, a caching operation typically involves serializing the data table configuration information into a storable format (e.g., JSON, XML, etc.) and writing it under a cache path specified by the search engine. Meanwhile, to ensure consistency and availability of data, the cache operation may also include a series of concurrent control and error handling mechanisms.
By caching the data table configuration information into the file cache of the target search engine, the system is able to read the configuration information directly from the cache when processing the query request without having to obtain it from the database or external configuration file each time. The optimization strategy obviously reduces the inquiry delay and improves the response speed and throughput of the system. In addition, the caching mechanism also helps to relieve pressure on the underlying storage system because frequent access requests for configuration information are reduced. This is particularly important for large distributed systems, as it can reduce the consumption of network bandwidth and reduce the degree of coupling between systems. Although the caching of configuration information into a file cache of a search engine may lead to significant performance improvements when the service is started, dynamic update of configuration information may also be considered. When a change in the data table structure occurs (e.g., a field is added, a field type is modified, etc.), the system needs to be able to detect this change and update the configuration information in the cache in real time. For example, a mechanism for listening and updating configuration information may be implemented at the bottom layer. When detecting that the data table configuration information changes, the method can automatically trigger updating operation, and rebuffer the latest data table configuration information to a file buffer area of the search engine. This mechanism may ensure that the search engine always processes query requests using the latest configuration information.
Step 120, in response to a change request corresponding to target data, storing the target data into a target database, and generating a change log corresponding to the target data, where the change log is used to record a change operation of the target data in the target database.
For example, when a change request (such as a new change, a modification, a deletion, a user information update, an order status change, etc.) is received for the target data, the service stores the target data in the database, applies the change operations corresponding to the change requests to the target database, and further generates a change log corresponding to the target data, where the change log is used for recording the change operations of the target data in the target database. This step ensures persistence and consistency of the data.
The target data refers to data that needs to be changed in the business process. Such data is typically associated with a particular business entity, such as personal information of the user, transaction records, order details, and the like. The target data may be structured, such as rows and columns in a database, or semi-structured or unstructured, such as a document or media file.
Wherein the target database is a database system for storing target data. It may be a relational database (e.g., mySQL, postgreSQL, etc.) that uses tables, rows, and columns to organize the data. The target database may also be a non-relational database (e.g., mongoDB, cassandra, etc.), which provides a more flexible data model suitable for processing large-scale distributed data.
For example, the change request may come from a variety of sources, such as user operations (e.g., adding, modifying, deleting user information), other service calls (e.g., order status changes), timing tasks (e.g., updating statistics periodically), and so forth. When a service receives a change request, it is necessary to process the change request and apply the change to the target database.
Storing target data into a target database, and ensuring the persistent storage of the data. For example, depending on the type of change request (e.g., add, modify, delete), the corresponding database operation is performed (e.g., INSERT, UPDATE, DELETE).
Then, a change log corresponding to the target data is generated. The change log is used for recording change operations of the target data in the target database, such as new addition, modification, deletion, user information update, order state change and the like.
And 130, pulling the change log in the target database through a data channel, and converting the target data into service data which can be identified by a target search engine according to the change log and the data table configuration information.
For example, the service pulls change logs in a target database (e.g., a relational database) in real-time through a data channel (e.g., canal, debezium, etc.). The change log is used for recording change operations of the target data in the target database, such as new addition, modification, deletion, user information update, order state change and the like. The service then converts the target data into a business data format recognizable by a target search engine (e.g., an elastomer search) based on the data table configuration information and the change log. This step ensures the correct representation and indexing of the data in the search engine.
In some embodiments, step 130 may be implemented by steps 131 to 136 (not shown in the figures), specifically:
and 131, pulling a change log in the target database through a data channel.
For example, appropriate data channel tools (e.g., canal, debezium, etc.) are selected based on the configuration, which can capture change logs in a target database (e.g., a relational database such as MySQL, postgreSQL, etc.) in real-time. These logs detail the change operations of the target data in the target database, such as additions, modifications, deletions, user information updates, order status changes, etc.
For example, each time a data change occurs in the target database, a corresponding change log is generated. The data channel tool will connect to the target database and snoop its change log (e.g., binlog of MySQL).
And step 132, constructing change data according to the change log, wherein the change data comprises detailed information of data change.
For example, the captured change log is parsed into structured change data that includes details of the data change, such as change type (e.g., new, modified, deleted), change time, data before and after the change, more table names of the data table, field names, and the like.
And step 133, writing the change data into a target source table of the target search engine.
For example, for subsequent processing and indexing, the change data is first written into a target source table of a target search engine (e.g., an elastomer search). This step is to ensure data integrity and consistency, while also facilitating subsequent batch processing or asynchronous processing.
For example, in general, there is a one-to-one association of a target source table with a target master table in a target database. When the change data is written into the target source table of the target search engine, the change data can be synchronized to the target master table in the target database.
Step 134, querying a service association configuration table associated with the changed data from the data table configuration information, where the service association configuration table is used to record an association relationship between service configuration information corresponding to the service table and source configuration corresponding to the source table, the service configuration information is used to define an association relationship between the service table and the main table, and the source configuration information is used to define an association relationship between the source table and the main table in the database.
For example, according to the table name information in the change data, the system locates the corresponding service table in the data table configuration information. Then, a service association configuration table associated with the located service table is searched. If the service table has an association relation with a plurality of source tables, all relevant service association configuration tables need to be traversed to acquire complete association information. For example, by analyzing the service configuration information in the service association configuration table, the association relationship between the service table and the main table, such as one-to-one, one-to-many, or many-to-many, can be obtained. For example, association rules of the source table and the main table in the database, such as foreign key constraint, connection condition, etc., can be obtained by analyzing the source configuration information in the service association configuration table.
And step 135, for each target business table related to the change, inquiring a main table data identifier according to the business association configuration table, and inquiring the corresponding target main table and the associated data in the target main table according to the main table data identifier.
For example, for each target service table related to the change, a main table data identifier (id) corresponding to each target service table is queried according to the service association configuration table. The primary table id is typically used to uniquely identify a record in the primary table and may be a primary key, a unique key, or other combination of fields capable of uniquely identifying the record.
The service association configuration table details the association between the service table and the main table, including which fields are foreign keys, and how they point to the corresponding fields of the main table.
Then, using the queried main table data identifier, the associated data corresponding to the service field of the target service table is retrieved. The acquisition mode of the link data is divided into two cases, wherein one case can be directly acquired from the target main table, and the other case can be acquired from the target source table based on the source configuration information.
For example, when the association relationship between the target service table and the target main table is relatively simple, and all the association data required by the target service table is already contained in the target main table, the association data can be directly obtained from the target main table. The method comprises the following specific steps:
1) And inquiring the main table data identifier, namely inquiring the main table data identifier (such as a main key id) corresponding to the target service table according to the service association configuration table.
2) And searching the associated data, namely matching the queried main table data identifier to a corresponding target main table, and searching the associated data corresponding to the service field of the target service table in the target main table. This step is typically accomplished by SQL queries or NoSQL query statements, depending on the type of database used for the target primary table.
3) And acquiring complete associated data, namely directly retrieving the associated data for one-to-one or one-to-many relation. For many-to-many relationships, complete association data can be obtained by querying and merging results multiple times.
For example, suppose that a change is involved in a target business form (employee information form) whose business fields include employee name, employee phone number, employer company 1-date, employer company 2-date, etc. Associated with this business table are two target master tables (employee table and work record table). For example, when the method of obtaining the association data from the target main table is adopted, the main table data identifier (such as the main key id) is firstly queried according to the service association configuration table, the main table data identifier (such as the main key id) is matched with the target main table 1 (staff table) and the target main table 2 (work record table) corresponding to the target service table (staff information table), and then the staff name and the staff mobile phone number are searched in the staff table by using the information such as the service field, the staff id and the like of the target service table. Employer company 1-date and employer company 2-date are then retrieved in the job record table using the business field of the target business table, employee id. The retrieved data is then integrated into complete associated data.
For example, when the association relationship between the target service table and the target main table is complex, or all the association data required for the target service table is not included in the target main table, the association data may be acquired from the target source table based on the source configuration information. The method comprises the following specific steps:
1) And inquiring the source configuration information, namely searching the source configuration information and a main table data identifier (such as a main key id) which are associated with the target service table according to the service association configuration table. The source configuration information typically includes association rules between the target source table and the target master table, such as foreign key constraints, connection conditions, and the like.
2) And constructing query conditions, namely querying the conditions of the target source table according to the source configuration information and the main table data identifier (such as the main key id) corresponding to the target service table. These conditions typically include foreign key fields in the target source table, and primary table data identifiers (e.g., primary key ids) corresponding to the target service table for the relevant fields in the target service table.
3) And (3) searching the source table data, namely searching related association data in the target source table by using the constructed query conditions. This step may also be accomplished by query statements, depending on the database or storage type used for the target source table.
4) And integrating the associated data, namely integrating the retrieved associated data in the target source table with the data in the service table to obtain complete associated data. The step can perform operations such as data de-duplication, merging, conversion and the like so as to ensure the accuracy and consistency of the data.
5) Writing the target main table, namely in some cases, writing the integrated associated data into the target main table for the convenience of subsequent processing. This step is optional, depending on the specific business requirements and process flows.
For example, suppose that a change is involved in a target business form (employee information form) whose business fields include employee name, employee phone number, employer company 1-date, employer company 2-date, etc. Associated with the business table are two target master tables (employee table and work record table), and associated with the two target master tables (employee table and work record table) are two target source tables (employee table and work record table). If the data recorded in the two target master tables (employee table and work record table) is incomplete, the data recorded in the two target source tables (employee table and work record table) is complete. For example, when the related data is acquired from the target source table based on the source configuration information, the main table data identifier (such as the main key id) and the source configuration information corresponding to the target service table (employee information table) are firstly queried according to the service related configuration table. According to the main table data identifier (such as a main key id) being matched with a target main table 1 (employee table) and a target main table 2 (work record table) corresponding to a target service table (employee information table), then when the service field of the target service table is used for searching the data which are matched with the service field in the target main table 1 (employee table) and the target main table 2 (work record table) or the data is incomplete, the association relation between the source table recorded in the source configuration information and the main table in the database is used for searching the target source table 1 (employee table) corresponding to the target main table 1 (employee table) and the target source table 2 (work record table) corresponding to the target main table 2 (work record table). The relevant employee data and work record data are then retrieved in target source table 1 (employee table) and target source table 2 (work record table). And then integrating, converting and the like the retrieved data to obtain complete associated data.
In summary, the associated data in step 135 may be selected to be directly obtained from the target main table or be obtained from the target source table based on the source configuration information according to the service requirement and the data processing flow. The choice of both ways depends on the specific traffic scenario and data relationship.
And 136, processing the associated data, and converting the processed associated data into service data which meets the service requirement and can be identified by a target search engine.
For example, the associated data is subjected to the necessary processing (e.g., formatting, de-duplication, merging, splitting, etc.) to convert it into a format that meets business requirements and that can be efficiently indexed by the target search engine. The processed data is sent to the target search engine for indexing and storage.
For complex associated business tables (e.g., customer tables), efficiency can be improved by optimizing the data processing flow when large amounts of data need to be updated in batches. For example, in one practical case, when 482 data are updated in batches, and writing of 1 source table and 4 service tables is involved, the total time of writing into the ES is optimized from 146 seconds to 84 seconds by using a serial update mode, and the update time of each row of data is reduced to 150 milliseconds on average. This optimization results from the fine design of the data processing flow and the full exploitation of the target search engine characteristics.
In some embodiments, the
Said for each target business table involving a change, querying a master table data identifier according to said business association configuration table, comprising:
For each target business table involving changes, a collaborative concurrency mechanism is used to query a master table data identifier according to the business association configuration table.
In order to further improve data processing efficiency, particularly when processing a scenario involving multiple service tables and complex association relationships, the service may use a collaborative concurrency mechanism to optimize the query flow. This mechanism allows the service to process multiple tasks simultaneously, thereby significantly increasing the overall processing speed.
For each target business table involving changes, the service may simultaneously launch multiple coroutines or threads to process the configuration information queries and associated configuration information queries for those tables in parallel. The service may initiate multiple querying tasks in parallel, each of which is responsible for querying one or more primary table data identifiers. The parallel processing mode can further improve the query efficiency and ensure that the service can quickly acquire all needed main table data identifiers.
For example, after all necessary information is obtained, the service converts the data in the change log in parallel to a business data format recognizable by the target search engine. This step may also utilize a coroutine or thread pool to accelerate the process.
As shown in fig. 2, it is assumed that there is a scenario including a plurality of service table configurations, each corresponding to a plurality of master table ids. For example, the multi-service table configuration includes a service table configuration a, a service table configuration B, and a service table configuration C, and the multi-master table ids corresponding to the service table configuration a include a master table idA1, a master table idA2, and a master table idA3. Under the condition that the query logic of the multi-service table configuration is heavy, if the query mode is modified, the whole data service needs to be reconstructed, and the optimization cost performance is low. If a traditional sequential query mode is adopted, the processing time is increased significantly.
To optimize this scenario, a coroutine (or other concurrent processing technique) may be employed to process the query of each business table and the query of each master table data ID in parallel. Coroutine is a lightweight thread that can be executed concurrently within the same thread. By using coroutines, the service can process multiple query tasks simultaneously without the overhead limitations of traditional thread switching.
In practical applications, this synergistic concurrency mechanism brings about significant performance improvement. For example, in a scenario containing a crm_customer source table and 16 service tables, when 511 rows of data need to be written, a service using a collaborative concurrency mechanism can complete the write operation in a total of 26 seconds, with an average update time of only 50 milliseconds per row of data. The optimization result proves the effectiveness of the collaborative concurrency mechanism in processing complex business table configuration and association relations.
In summary, by adopting the collaborative concurrency mechanism, a plurality of query tasks can be processed in parallel, thereby remarkably improving the data processing efficiency. The mechanism is particularly suitable for processing scenes involving a plurality of service tables and complex association relationships, and can bring about remarkable performance improvement.
And 140, writing the service data into the memory of the target search engine.
For example, the converted business data is written into the memory of the target search engine. This step is to increase the speed of data retrieval and response. Because the memory access speed is far faster than the disk access speed, storing data in memory can significantly improve the query performance.
In some embodiments, the writing the business data into the memory of the target search engine includes:
determining a target source table and a plurality of target service tables related to the service data;
And asynchronously writing the target source table and the target service tables related to the service data into the memory of the target search engine.
For example, the business data may relate to one or more source tables and a plurality of business tables. The source table generally refers to a table where original data is located, and the service table refers to a data table processed according to service logic. The system needs to determine all target source tables and target service tables involved in the service data, and ensures that all relevant data can be correctly written into the memory.
For example, to further increase the efficiency of data writing, business data may be written into the memory of the search engine in an asynchronous manner. Asynchronous writing means that the data write operation does not block execution of the main thread, but is performed in the background thread. Thus, response delay caused by data writing operation can be avoided, and the throughput of the system is improved. The system may buffer data to be written using a message queue, log system, or other middleware, and then asynchronously batch write the data to memory to reduce the overhead of write operations.
For example, to further improve the efficiency of writing into the memory, a batch writing manner may be adopted, that is, a certain amount of data is packed each time and then written into the memory at a time.
In addition, the memory use condition can be detected, and the size of the memory buffer area can be dynamically adjusted so as to balance the memory use and the query performance.
In some high throughput application scenarios, the refresh interval of the elastic search may also be configured to control the frequency of writing data in the memory to the disk, so as to optimize data synchronization between the memory and the disk.
For example, the persistence settings of the elastic search may also be configured to periodically flush the data in memory to disk, or to use copy shards to provide redundant storage of the data.
By asynchronously writing the business data into the memory of the target search engine, the quick data retrieval capability can be provided, and the requirement of a user on real-time data access is met. Meanwhile, the design is also beneficial to improving the response speed and throughput of the system and enhancing the user experience.
And step 150, responding to the inquiry request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data.
For example, when a query request is received, business data is retrieved from the memory of the target search engine. In order to ensure the durability and consistency of the data, the data in the memory can be refreshed into the file cache. And then filtering and sequencing the business data according to the query conditions, finally generating a data service list containing the business data, and displaying the data service list to a user or a caller. This ensures that the user is able to obtain the latest and most accurate data.
In some embodiments, the refreshing the business data in the memory of the target search engine to the file cache of the target search engine in response to the query request, and displaying a data service list containing the business data, includes:
after receiving the response ending information of the change request, calling a query request;
Responding to the query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine;
Retrieving the service data from the file cache according to the query request;
packaging the business data into a data format meeting the front-end display requirement to obtain a data service list containing the business data;
And displaying a data service list containing the business data.
Where a data change process (such as data insertion, update or deletion) is typically preceded by a data query. When this change process is completed, the system receives a response end message for the change request. At this point, the system may invoke a query request to obtain the latest business data.
In order to ensure the durability and consistency of the data, when a query request is received, the system refreshes the business data in the memory of the target search engine into the file cache area. Because the access speed of the data in the memory is high, the data in the memory is lost once the system is crashed or powered off and other abnormal conditions occur. By refreshing the data to the file cache, the data can be recovered from the file cache even if the system is abnormal.
After the data is refreshed in the file cache, the system retrieves relevant service data from the file cache according to the conditions (such as keywords, time range, etc.) in the query request.
The retrieved service data is usually stored in a raw format, and in order to meet the requirements of front-end presentation, the data needs to be encapsulated into a specific data format (such as JSON, XML, etc.). After the encapsulation is completed, the system filters and sorts the service data according to the query conditions, and finally generates a data service list containing the service data.
After the data service list is generated, the system can display the data service list to a user or a calling party, and the user can quickly acquire the required information through a clear and visual display mode.
For example, in order to ensure that the user can acquire the latest data in real time, the back-end service may refresh the data in the memory to the file cache in real time by forced refresh. Such forced refresh operations, while adding some performance overhead, are of a controllable overall cost because the call frequency is not high (e.g., 1000 times per day on average). Through the forced refreshing operation, the system can remarkably reduce the time delay of the interface for inquiring the real-time data, and the time delay is reduced from 3 seconds to 200 milliseconds. The optimization greatly improves the user experience, so that the user can perceive real-time data updating, and the responsiveness and the interactivity of the system are enhanced.
As shown in fig. 3, the system implements the function of updating data in the ES as soon as possible after modifying the data in the target database through two main phases.
The first stage is mainly data change and asynchronous write target search Engine (ES):
The front end initiates a change request, namely the front end initiates a change request corresponding to target data to the system;
the change request processing, which is to respond to the change request, the system stores the target data into a target database (such as MySQL) and generates a change log corresponding to the target data, wherein the change log is used for recording the change operation of the target data in the target database;
The change log capturing, namely after the data in the target database is modified, a data channel (such as Canal) pulls the change log (binlog) of the database in real time;
processing the data service, namely performing necessary processing treatment on the pulled change log through the data service, for example, converting target data into service data identifiable by a target search Engine (ES) according to the change log and data table configuration information, wherein the data table configuration information is acquired when the service is started and is cached in a file cache area of the target search Engine (ES);
And asynchronously writing the processed service data into a file cache region of a target search Engine (ES), thereby ensuring the real-time property and consistency of data updating.
The second stage mainly comprises front-end inquiry and data refreshing display:
the front end initiates a query request, namely after the change request is returned to the front end, the front end is triggered to initiate a data query request;
query request processing, namely responding to a query request, and calling a data service query interface;
The method comprises the steps of refreshing a memory and a file cache region, wherein in order to improve query efficiency, a system firstly checks whether the memory of a target search Engine (ES) has latest service data or not;
and displaying the data service list, namely displaying the data service list containing the latest business data to a front-end user, so as to ensure that the user can immediately see the latest query result.
Through the two stages, the system realizes the process of changing database data to the real-time updating of the data in the ES and the requirement that the front-end user can immediately inquire the latest data.
The embodiment of the application can obviously improve the efficiency of data processing and retrieval by combining the modes of general configuration data caching, coroutine concurrent processing and refreshing (refresh) during inquiry, and simultaneously ensures the real-time property and consistency of data.
Where metadata information such as database table structure, field mapping, index configuration, etc. is often required to be accessed frequently when handling database changes (e.g., binlog of MySQL). The information is not frequently changed, so that the information can be cached to reduce the access times to the database and improve the processing efficiency. By realizing a general configuration data caching mechanism, the configuration information can be loaded into the memory when the service is started or the configuration is changed, and can be directly read from the cache when the binlog change is processed, so that frequent database access is avoided, and the processing rate is remarkably improved.
When a large number of binlog changes are processed, in order to improve the processing efficiency, a cooperative program concurrency mode may be adopted to allocate the processing task of each binlog change to one cooperative program for execution. Coroutines are lightweight threads that can concurrently execute multiple tasks within the same thread without the overhead of context switching. By reasonably distributing the coroutine quantity, the computing capability of the multi-core CPU can be fully utilized, and parallel processing is realized, so that the processing rate of binlog change is obviously improved.
In order to ensure that the read data is up-to-date when searching for data using a search engine (e.g., an elastomer search, ES for short), a refresh operation may be forced at the time of query. The refresh operation will refresh the data in memory into the file buffer of the ES so that it can be retrieved. Since the memory index of the ES is much faster than the disk index, the queried data can be ensured to be up to date by refresh operation.
All the above technical solutions may be combined to form an optional embodiment of the present application, and will not be described in detail herein.
The embodiment of the application reads the configuration information of the data table and caches the configuration information of the data table in a file cache area of a target search engine when the service is started, responds to a change request corresponding to the target data, stores the target data in a target database, generates a change log corresponding to the target data, is used for recording the change operation of the target data in the target database, pulls the change log in the target database through a data channel, converts the target data into service data identifiable by the target search engine according to the change log and the configuration information of the data table, writes the service data in a memory of the target search engine, responds to a query request, refreshes the service data in the memory of the target search engine to the file cache area of the target search engine, and displays a data service list containing the service data. The application remarkably reduces the delay time from changing to displaying the data by reading and caching the configuration information of the data table when the service is started and forcedly refreshing the data in the target search engine when the query request is invoked, can accelerate the data writing and query process through a concurrent processing mechanism, improves the overall efficiency of data processing, and greatly improves the user experience by reducing the data delay so that the user can see the data change almost in real time on a front-end interface. The embodiment of the application can reduce data delay, improve data processing efficiency and achieve the effect of real-time data display.
In order to facilitate better implementation of the data processing method according to the embodiment of the present application, the embodiment of the present application further provides a data processing device. Referring to fig. 4, fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the application. Wherein the data processing apparatus 200 may include:
the first processing unit 210 is configured to read the data table configuration information when the service is started, and cache the data table configuration information in a file cache area of the target search engine;
A change unit 220, configured to respond to a change request corresponding to target data, store the target data into a target database, and generate a change log corresponding to the target data, where the change log is used to record a change operation of the target data in the target database;
the second processing unit 230 is configured to pull the change log in the target database through a data channel, and convert the target data into service data identifiable by a target search engine according to the change log and the data table configuration information;
a writing unit 240, configured to write the service data into the memory of the target search engine;
and the query unit 250 is configured to respond to a query request, refresh the service data in the memory of the target search engine to a file cache of the target search engine, and display a data service list containing the service data.
In some embodiments, the second processing unit 230 may be configured to:
pulling the change log in the target database through a data channel;
constructing change data according to the change log, wherein the change data comprises detailed information of data change;
Writing the change data into a target source table of the target search engine;
Inquiring a service association configuration table associated with the changed data from the data table configuration information, wherein the service association configuration table is used for recording the association relation between service configuration information corresponding to the service table and source configuration information corresponding to the source table, the service configuration information is used for defining the association relation between the service table and the main table, and the source configuration information is used for defining the association relation between the source table and the main table in the database;
For each target business table related to the change, inquiring a main table data identifier according to the business association configuration table, and inquiring corresponding target main table and associated data in the target main table according to the main table data identifier;
And processing the associated data, and converting the processed associated data into service data which can be identified by a target search engine and meets the service requirement.
In some embodiments, the second processing unit 230 may be configured to:
For each target business table involving changes, a collaborative concurrency mechanism is used to query a master table data identifier according to the business association configuration table.
In some embodiments, the writing unit 240 may be configured to:
determining a target source table and a plurality of target service tables related to the service data;
And asynchronously writing the target source table and the target service tables related to the service data into the memory of the target search engine.
In some embodiments, the first processing unit 210 may be configured to:
reading data table configuration information when the service is started;
initializing the data channel;
and caching the data table configuration information to a file cache area of the target search engine.
In some embodiments, the query unit 250 may be configured to:
after receiving the response ending information of the change request, calling a query request;
Responding to the query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine;
Retrieving the service data from the file cache according to the query request;
packaging the business data into a data format meeting the front-end display requirement to obtain a data service list containing the business data;
And displaying a data service list containing the business data.
All the above technical solutions may be combined to form an optional embodiment of the present application, and will not be described in detail herein.
It will be appreciated that data processing apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may be made with reference to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the data processing apparatus may execute the above-mentioned data processing method embodiment, and the foregoing and other operations and/or functions of each unit in the data processing apparatus implement respective flows of the above-mentioned method embodiment, which are not described herein for brevity.
Optionally, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above method embodiments when executing the computer program.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device may be a terminal or a server. As shown in fig. 5, the computer device 300 may include a communication interface 301, a memory 302, a processor 303, and a communication bus 304. Communication interface 301, memory 302, and processor 303 enable communication with each other via communication bus 304. The communication interface 301 is used for data communication between the computer device 300 and an external device. The memory 302 may be used to store software programs and modules, and the processor 303 may execute the software programs and modules stored in the memory 302, such as the software programs for corresponding operations in the foregoing method embodiments.
Alternatively, the processor 303 may call a software program and module stored in the memory 302 to perform the following operations:
The method comprises the steps of reading data table configuration information when a service is started, caching the data table configuration information in a file cache area of a target search engine, responding to a change request corresponding to target data, storing the target data in a target database, generating a change log corresponding to the target data, wherein the change log is used for recording change operation of the target data in the target database, pulling the change log in the target database through a data channel, converting the target data into service data identifiable by the target search engine according to the change log and the data table configuration information, writing the service data in a memory of the target search engine, responding to a query request, refreshing the service data in the memory of the target search engine to the file cache area of the target search engine, and displaying a data service list containing the service data.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of computer programs that can be loaded by a processor to perform the steps of any of the data processing methods provided by the embodiments of the present application. The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
The storage medium may include Read Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
The steps of any data processing method provided by the embodiment of the present application can be executed by the computer program stored in the storage medium, so that the beneficial effects of any data processing method provided by the embodiment of the present application can be achieved, and detailed descriptions of the foregoing embodiments are omitted.
Embodiments of the present application also provide a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the corresponding flow in any data processing method in the embodiment of the present application, which is not described herein for brevity.
The embodiments of the present application also provide a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the corresponding flow in any data processing method in the embodiment of the present application, which is not described herein for brevity.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method of data processing, the method comprising:
Reading data table configuration information when a service is started, and caching the data table configuration information to a file cache region of a target search engine;
Responding to a change request corresponding to target data, storing the target data into a target database, and generating a change log corresponding to the target data, wherein the change log is used for recording change operation of the target data in the target database;
Pulling the change log in the target database through a data channel, and converting the target data into service data identifiable by a target search engine according to the change log and the data table configuration information;
Writing the service data into the memory of the target search engine;
And responding to the query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data.
2. The data processing method as claimed in claim 1, wherein the pulling the change log in the target database through a data channel and converting the target data into service data recognizable by a target search engine according to the change log and the data table configuration information comprises:
pulling the change log in the target database through a data channel;
constructing change data according to the change log, wherein the change data comprises detailed information of data change;
Writing the change data into a target source table of the target search engine;
Inquiring a service association configuration table associated with the changed data from the data table configuration information, wherein the service association configuration table is used for recording the association relation between service configuration information corresponding to the service table and source configuration information corresponding to the source table, the service configuration information is used for defining the association relation between the service table and the main table, and the source configuration information is used for defining the association relation between the source table and the main table in the database;
For each target business table related to the change, inquiring a main table data identifier according to the business association configuration table, and inquiring corresponding target main table and associated data in the target main table according to the main table data identifier;
And processing the associated data, and converting the processed associated data into service data which can be identified by a target search engine and meets the service requirement.
3. The data processing method of claim 2, wherein for each target service table involving a change, querying a master table data identifier according to the service association configuration table comprises:
For each target business table involving changes, a collaborative concurrency mechanism is used to query a master table data identifier according to the business association configuration table.
4. The data processing method as claimed in claim 2, wherein said writing said business data into the memory of said target search engine comprises:
determining a target source table and a plurality of target service tables related to the service data;
And asynchronously writing the target source table and the target service tables related to the service data into the memory of the target search engine.
5. The data processing method as claimed in claim 1, wherein the reading the data table configuration information at the service start-up and caching the data table configuration information in the file cache area of the target search engine comprises:
reading data table configuration information when the service is started;
initializing the data channel;
and caching the data table configuration information to a file cache area of the target search engine.
6. The data processing method according to any one of claims 1 to 5, wherein the refreshing the business data in the memory of the target search engine to the file cache of the target search engine and displaying a data service list containing the business data in response to a query request includes:
after receiving the response ending information of the change request, calling a query request;
Responding to the query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine;
Retrieving the service data from the file cache according to the query request;
packaging the business data into a data format meeting the front-end display requirement to obtain a data service list containing the business data;
And displaying a data service list containing the business data.
7. A data processing apparatus, the apparatus comprising:
The first processing unit is used for reading the data table configuration information when the service is started and caching the data table configuration information to a file cache area of the target search engine;
The change unit is used for responding to a change request corresponding to target data, storing the target data into a target database, and generating a change log corresponding to the target data, wherein the change log is used for recording change operation of the target data in the target database;
The second processing unit is used for pulling the change log in the target database through a data channel and converting the target data into service data which can be identified by a target search engine according to the change log and the data table configuration information;
The writing unit is used for writing the service data into the memory of the target search engine;
And the query unit is used for responding to a query request, refreshing the business data in the memory of the target search engine to a file cache area of the target search engine, and displaying a data service list containing the business data.
8. A computer device, characterized in that it comprises a processor and a memory, in which a computer program is stored, the processor being arranged to execute the data processing method according to any of claims 1-6 by invoking the computer program stored in the memory.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is adapted to be loaded by a processor for performing the data processing method according to any of claims 1-6.
10. A computer program product comprising computer instructions which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411820869.0A CN119961502A (en) | 2024-12-11 | 2024-12-11 | Data processing method, device, equipment, storage medium and program product |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411820869.0A CN119961502A (en) | 2024-12-11 | 2024-12-11 | Data processing method, device, equipment, storage medium and program product |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119961502A true CN119961502A (en) | 2025-05-09 |
Family
ID=95588754
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411820869.0A Pending CN119961502A (en) | 2024-12-11 | 2024-12-11 | Data processing method, device, equipment, storage medium and program product |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119961502A (en) |
-
2024
- 2024-12-11 CN CN202411820869.0A patent/CN119961502A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11327962B1 (en) | Real-time analytical database system for querying data of transactional systems | |
| CN111177161B (en) | Data processing method, device, computing equipment and storage medium | |
| JP6266630B2 (en) | Managing continuous queries with archived relations | |
| CN108920659B (en) | Data processing system and data processing method thereof, and computer-readable storage medium | |
| WO2021184761A1 (en) | Data access method and apparatus, and data storage method and device | |
| CN111026727A (en) | Table dimension retrieval data synchronization method, system and device based on log file | |
| US10860562B1 (en) | Dynamic predicate indexing for data stores | |
| US8903782B2 (en) | Application instance and query stores | |
| WO2021147935A1 (en) | Log playback method and apparatus | |
| CN111753141B (en) | A data management method and related equipment | |
| CN111752920A (en) | Method, system and storage medium for managing metadata | |
| CN104199978A (en) | System and method for realizing metadata cache and analysis based on NoSQL and method | |
| CN112269802A (en) | A method and system for frequent deletion, modification and search optimization based on Clickhouse | |
| CN117422556B (en) | Derivative transaction system, device and computer medium based on replication state machine | |
| CN115952200A (en) | Multi-source heterogeneous data aggregation query method and device based on MPP (maximum power point tracking) architecture | |
| CN115292415A (en) | Database access method and device | |
| CN116126620A (en) | Database log processing method, database change query method and related devices | |
| CN112286892B (en) | Data real-time synchronization method and device of post-relation database, storage medium and terminal | |
| CN113778996A (en) | Large data stream data processing method and device, electronic equipment and storage medium | |
| CN116414917B (en) | Data transmission method, device, equipment and storage medium based on Myhouse database | |
| CN119961502A (en) | Data processing method, device, equipment, storage medium and program product | |
| CN115168468B (en) | A real-time data warehouse ETL method for parallel parsing of business library logs | |
| CN118170760A (en) | Efficient data storage and retrieval method for SaaS system | |
| CN115658816A (en) | Method for synchronizing HBase data to QianBase MPP in real time | |
| CN113934768A (en) | Target identification data query method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |