CN118691321B

CN118691321B - Online car market automobile substation data management and control platform based on edge computing

Info

Publication number: CN118691321B
Application number: CN202411162683.0A
Authority: CN
Inventors: 林渝奇; 胡文勇; 高飞飞; 潘越; 张有波; 仕育恺
Original assignee: Fir Tree Beijing Technology Co ltd
Current assignee: Fir Tree Beijing Technology Co ltd
Priority date: 2024-08-23
Filing date: 2024-08-23
Publication date: 2024-11-12
Anticipated expiration: 2044-08-23
Also published as: CN118691321A

Abstract

The present application provides an online car market car substation data management and control platform based on edge computing, which relates to the field of data transmission technology, including: a data extraction module obtains historical data from K car substations, an edge computing module uses supervised and unsupervised identifiers for data processing; a data annotation module annotates part of the data; the first and second training modules train annotated and unannotated identifiers respectively, and the parameter distribution and update module updates the trained parameters to the management and control platform and distributes them back to the car substation; the analysis module acquisition module obtains a complete analysis module through multiple trainings, and the data upload module uploads the calculation results to the management and control platform. This application can solve the problem of different degrees of label scarcity in different clients due to the different annotation capabilities of different clients in the prior art, realize efficient processing and analysis of data at the edge, and upload it to the management and control platform in a timely manner, thereby improving data processing efficiency.

Description

Online city automobile substation data management and control platform based on edge calculation

Technical Field

The application relates to the technical field of data transmission, in particular to an online bus and city bus substation data management and control platform based on edge calculation.

Background

With the rapid development of the automobile industry, especially the rise of intelligent internet-connected automobiles, the data volume related to automobiles shows explosive growth. The data comprise various information such as vehicle states, driving behaviors, road conditions and the like, and the data have important significance for improving the safety, comfort and intelligent level of the automobile. However, the conventional data processing method cannot meet the requirements of the automotive industry for real-time performance, high efficiency and safety.

At present, the existing data processing system of the automobile substation generally has the problems of low data processing efficiency, high data labeling cost, untimely model updating, data security risk and the like. The traditional centralized data processing mode needs to upload all data to a central server for processing, which not only increases the delay of data transmission, but also increases the processing burden of the server, and results in slow data processing speed. In addition, the updating of the model usually depends on the unified pushing of a central server, but due to network delay, server load and the like, the updating is often not timely enough, and the real-time performance and accuracy of the data analysis of the automobile substation are affected. Finally, frequent transmission of data also increases the risk of data disclosure, which poses a threat to privacy protection of users.

In summary, in the prior art, due to different labeling capabilities owned by different clients, the problem of label scarcity of different degrees occurs in different clients, and the gradient directions of the updated model are further caused to collide, so that the overall performance and the user experience of the data analysis system of the automobile substation are affected.

Disclosure of Invention

The application aims to provide an online bus and city bus substation data management and control platform based on edge calculation, which is used for solving the problem that labels of different degrees are scarce in different clients due to different labeling capabilities of different clients in the prior art, and further enabling gradient directions of updated models to collide, so that the overall performance and user experience of a bus substation data analysis system are affected.

In view of the problems, the application provides an online bus and city bus substation data management and control platform based on edge calculation.

The application provides an online city automobile substation data management and control platform based on edge calculation, which comprises the following components: the data extraction module is used for extracting data of the K automobile substations in a preset history window according to a preset index set to obtain K automobile substation history data sets; the edge calculation module is used for obtaining K data analysis modules corresponding to the K automobile substations, wherein the K data analysis modules are used for carrying out edge calculation on data acquired by the K automobile substations, and the K data analysis modules comprise K supervised identifiers and K unsupervised identifiers; the data marking module is used for randomly extracting M automobile substation historical data sets from the K automobile substation historical data sets to carry out data marking, and obtaining M marked automobile substation historical data sets and N unmarked automobile substation historical data sets; the automobile substation module is used for mapping and acquiring M marked automobile substations and N unmarked automobile substations based on the M marked automobile substation historical data sets and the N unmarked automobile substation historical data sets; the first training module is used for training the M marked supervised identifiers and the M marked unsupervised identifiers of the M marked automobile substations by utilizing the M marked automobile substation historical data sets respectively to obtain M marked supervised identifier parameters and M marked unsupervised identifier parameters; the second training module is used for training the N non-marked supervised identifiers and the N non-marked supervised identifiers of the N non-marked automobile substations by utilizing the N non-marked automobile substation historical data sets respectively to obtain N non-marked supervised identifier parameters and N non-marked supervised identifier parameters; the parameter distribution module is used for inputting the M marked supervised identifier parameters and the N unmarked supervised identifier parameters into a server of a management and control platform for parameter updating to obtain updated supervised identifier parameters, and distributing the updated supervised identifier parameters to the M marked supervised identifiers of the M marked automobile substations and the N unmarked supervised identifiers of the N unmarked automobile substations; the parameter updating module is used for inputting the M marked non-supervision identifier parameters and the N marked non-supervision identifier parameters into a server of the management and control platform for parameter updating, obtaining updated non-supervision identifier parameters, and distributing the updated non-supervision identifier parameters to the M marked non-supervision identifiers of the M marked automobile substations and the N non-marked non-supervision identifiers of the N non-marked automobile substations; the analysis module acquisition module is used for obtaining M marked data analysis modules and N unmarked data analysis modules after training is completed after a plurality of training iterations until a preset training round is met; and the data uploading module is used for carrying out edge calculation on the data acquired by the K automobile substations based on the M marked data analysis modules and the N unmarked data analysis modules, and uploading calculation results to the management and control platform.

One or more technical schemes provided by the application have at least the following technical effects or advantages:

The data extraction module is used for extracting data of the K automobile substations in a preset history window according to a preset index set to obtain K automobile substation history data sets; the edge calculation module is used for obtaining K data analysis modules corresponding to the K automobile substations, wherein the K data analysis modules are used for carrying out edge calculation on data acquired by the K automobile substations, and the K data analysis modules comprise K supervised identifiers and K unsupervised identifiers; the data marking module is used for randomly extracting M automobile substation historical data sets from the K automobile substation historical data sets to carry out data marking, and obtaining M marked automobile substation historical data sets and N unmarked automobile substation historical data sets; The automobile substation module is used for mapping and acquiring M marked automobile substations and N unmarked automobile substations based on the M marked automobile substation historical data sets and the N unmarked automobile substation historical data sets; the first training module is used for training the M marked supervised identifiers and the M marked unsupervised identifiers of the M marked automobile substations by utilizing the M marked automobile substation historical data sets respectively to obtain M marked supervised identifier parameters and M marked unsupervised identifier parameters; the second training module is used for training the N non-marked supervised identifiers and the N non-marked supervised identifiers of the N non-marked automobile substations by utilizing the N non-marked automobile substation historical data sets respectively to obtain N non-marked supervised identifier parameters and N non-marked supervised identifier parameters; The parameter distribution module is used for inputting the M marked supervised identifier parameters and the N unmarked supervised identifier parameters into a server of a management and control platform for parameter updating to obtain updated supervised identifier parameters, and distributing the updated supervised identifier parameters to the M marked supervised identifiers of the M marked automobile substations and the N unmarked supervised identifiers of the N unmarked automobile substations; the parameter updating module is used for inputting the M marked non-supervision identifier parameters and the N marked non-supervision identifier parameters into a server of the management and control platform for parameter updating, obtaining updated non-supervision identifier parameters, and distributing the updated non-supervision identifier parameters to the M marked non-supervision identifiers of the M marked automobile substations and the N non-marked non-supervision identifiers of the N non-marked automobile substations; The analysis module acquisition module is used for obtaining M marked data analysis modules and N unmarked data analysis modules after training is completed after a plurality of training iterations until a preset training round is met; The data uploading module is used for carrying out edge calculation on the data acquired by the K automobile substations based on the M marked data analysis modules and the N unmarked data analysis modules, uploading calculation results to the management and control platform, effectively solving the problem that labels of different degrees are scarce in different clients due to different marking capacities in the prior art, further enabling gradient directions of an update model to collide, thereby influencing the overall performance and user experience of the automobile substation data analysis system, realizing efficient processing and analysis on the data at the edge end, and uploading the data to the management and control platform in time, and improving the data processing efficiency.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following brief description will be given of the drawings used in the description of the embodiments or the prior art, it being obvious that the drawings in the description below are only exemplary and that other drawings can be obtained from the drawings provided without the inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a structure of an online bus and city bus substation data management and control platform based on edge calculation;

fig. 2 is a schematic flow chart of obtaining M marked automobile substation history data sets in the online city automobile substation data management and control platform based on edge calculation.

Reference numerals illustrate:

The system comprises a data extraction module 11, an edge calculation module 12, a data annotation module 13, an automobile substation module 14, a first training module 15, a second training module 16, a parameter distribution module 17, a parameter updating module 18, an analysis module acquisition module 19 and a data uploading module 20.

Detailed Description

According to the online bus and city bus substation data management and control platform based on edge calculation, the problem that labels of different degrees are scarce due to different labeling capabilities of different clients in the prior art is solved, and gradient directions of updated models are further caused to conflict, so that the overall performance and user experience of a bus substation data analysis system are affected, efficient processing and analysis of data at the edge end are achieved, the data are uploaded to the management and control platform in time, and data processing efficiency is improved.

In the following, the technical solutions of the present application will be clearly and completely described with reference to the accompanying drawings, and it should be understood that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application, and that the present application is not limited by the exemplary embodiments described herein. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. It should be further noted that, for convenience of description, only some, but not all of the drawings related to the present application are shown.

The application provides an online city automobile substation data management and control platform based on edge calculation, referring to fig. 1, the online city automobile substation data management and control platform based on edge calculation comprises:

The data extraction module 11 is configured to extract data of the K automobile substations according to a preset index set within a preset history window, so as to obtain K automobile substation history data sets.

Specifically, the preset history window refers to a particular time period, such as a month, year, or any other specified time range in the past. During this time window, data will be extracted from the individual car substations. The set of preset indicators is a set of predefined data indicators for extracting specific information from the data of the auto substation. These metrics include vehicle sales metrics, user feedback metrics, market metrics, user behavior metrics, and the like. The K automobile substations are K different automobile sales or service sites that require data extraction. Data relating to a set of preset indicators is retrieved and collected from a database for each auto substation. Finally, K different data sets will be obtained, each data set containing historical data of a respective car substation within a preset historical window. According to the business requirement and the analysis purpose, a specific time range is determined as a preset history window and is connected to the database of each automobile substation, and for each automobile substation, data is extracted from the history database according to a preset index set, so that the extracted data accords with the definition of the preset index set, and the data format is correct.

The edge calculation module 12 is configured to obtain K data analysis modules corresponding to the K automobile substations, where the K data analysis modules are configured to perform edge calculation on data collected by the K automobile substations, and the K data analysis modules include K supervised identifiers and K unsupervised identifiers.

Specifically, the K data analysis modules include K supervised identifiers, which are models trained using labeled data, and K unsupervised identifiers, which in this embodiment may be used to identify specific sales patterns, predict sales, identify customer behavior, etc. The unsupervised identifier is mainly used for discovering patterns, anomaly detection or cluster analysis in the data, and does not need pre-labeled data, and in this embodiment, the unsupervised identifier can help discover abnormal patterns of sales data, clusters of customer behaviors and the like.

The data labeling module 13 is configured to randomly extract M car substation history data sets from the K car substation history data sets to perform data labeling, and obtain M labeled car substation history data sets and N unlabeled car substation history data sets.

Specifically, historical data sets extracted from K automobile substations are raw data that are not labeled. According to the resources and requirements, the number M of the data sets to be marked is determined by a person skilled in the art, M is an integer smaller than K, and the number M is used for representing the number of data of the automobile substations to be marked. Using a random number generator, M data sets are randomly selected from the K automobile substation history data sets, so that randomness and representativeness of the labeling process can be ensured. And carrying out data labeling work on the extracted M data sets. Data annotation is the addition of a corresponding tag or annotation to the data according to the specific task requirements. For example, in the car sales data, the label includes a car model, a sales status, sold or unsold, a customer type, and the like. After the labeling is completed, M labeled automobile substation historical data sets are possessed. The remaining K-M data sets remain unlabeled. Finally, M marked automobile substation historical data sets and N=K-M unmarked automobile substation historical data sets are obtained.

The automobile substation module 14 is configured to obtain M marked automobile substations and N unmarked automobile substations based on the M marked automobile substation history data sets and the N unmarked automobile substation history data set maps.

Specifically, there is a one-to-one correspondence between the historical data set of each automobile substation and the corresponding automobile substation, and this mapping relationship is fixed, i.e., one automobile substation corresponds to one data set. It is possible to determine which car substations are annotated and which are unlabeled directly from the annotated and unlabeled historical data sets. A mapping table is created listing the correspondence between all K auto substations and their respective sets of historical data. This table may be a simple database table, spreadsheet or dictionary structure, where the keys are identifiers of the automobile substations and the values are identifiers of the corresponding data sets. A field is added to the mapping table to record the labeling status, e.g., labeled or unlabeled, of each data set.

The first training module 15 is configured to train the M supervised labeling identifiers and the M unsupervised labeling identifiers of the M labeled automobile substations by using the M historical data sets of labeled automobile substations, respectively, to obtain M supervised labeling identifier parameters and M unsupervised labeling identifier parameters.

Specifically, M already-annotated sets of automobile substation history data are received. These data sets have been annotated and, in turn, these annotation data are used to train M annotated supervised recognizers corresponding to M annotated automobile substations, which are machine learning models, such as neural networks, decision trees, support vector machines, or the like, trained based on the relationships between known inputs and outputs. By continuously adjusting the model parameters, it is enabled to better fit the input and output relationships in the training data. At the same time, the module trains M label unsupervised recognizers using the same set of label data. After a certain round of training, M marked supervised identifier parameters and M marked unsupervised identifier parameters are extracted.

And the second training module 16 is configured to train the N unlabeled supervised identifiers and the N unlabeled unsupervised identifiers of the N unlabeled automobile substations by using the N unlabeled automobile substation historical data sets respectively, so as to obtain N unlabeled supervised identifier parameters and N unlabeled unsupervised identifier parameters.

Specifically, N unlabeled automobile substation history data sets are received. The supervised identifiers are trained, and N non-annotated supervised identifiers of N non-annotated automobile substations are trained. For example, semi-supervised learning, using part of the marked data and a large amount of unmarked data, or self-supervised learning, self-labels or tasks are generated from the unmarked data. By these methods, the model can learn useful representations or features from unlabeled data. The non-supervised identifiers are trained, and N non-annotated non-supervised identifiers are trained using the unlabeled data. These unsupervised recognizers are used for tasks such as clustering, dimension reduction, anomaly detection, etc., to discover intrinsic structures and associations in data. After training is completed, N non-labeled supervised identifier parameters and N non-labeled supervised identifier parameters are extracted.

The parameter distribution module 17 is configured to input the M supervised identifier parameters and the N non-supervised identifier parameters into a server of a management and control platform for parameter update, obtain updated supervised identifier parameters, and distribute the updated supervised identifier parameters to the M supervised identifiers of the M annotated automobile substations and the N non-supervised identifiers of the N non-annotated automobile substations.

Specifically, M supervised identifier parameters and N unlabeled supervised identifier parameters are uploaded to a server of a management and control platform. And after receiving all the parameters, the management and control platform server integrates and stores the parameters. The server may integrate and optimize these parameters using weighted averaging, federal learning, or other ensemble learning methods, generating updated supervised identifier parameters. The server packages the updated supervised identifier parameters for ready distribution. The server distributes the packaged updated supervised identifier parameters to the M supervised identifiers of the M annotated automobile substations and the N supervised identifiers of the N non-annotated automobile substations. After each automobile substation receives the new parameters, the new parameters are applied to a local supervised identifier, and updating of the model parameters is completed. After the updating is completed, consistency verification can be performed once, so that the model parameters of all substations are ensured to be successfully updated into the latest parameters.

The parameter updating module 18 is configured to input the M label non-supervised identifier parameters and the N label non-supervised identifier parameters into a server of the management and control platform for parameter updating, obtain updated non-supervised identifier parameters, and distribute the updated non-supervised identifier parameters to the M label non-supervised identifiers of the M labeled automobile substations and the N non-label non-supervised identifiers of the N non-labeled automobile substations.

Specifically, the M annotated non-supervised identifier parameters and the N annotated non-supervised identifier parameters are uploaded to a server of the management and control platform. The server may integrate and optimize these parameters using weighted averaging, federal learning, or other ensemble learning methods, generating updated unsupervised identifier parameters. After verifying and confirming the validity of the updated unsupervised identifier parameters, these parameters are ready to be distributed to the individual substations. And distributing the updated parameters of the non-supervision identifiers to M marked non-supervision identifiers of the M marked automobile substations and N non-marked non-supervision identifiers of the N non-marked automobile substations through a management and control platform. After each automobile substation receives the new parameters, the new parameters are applied to a local non-supervision identifier, and updating of the model parameters is completed. After all the substations complete parameter updating, one consistency check is performed to ensure that the unsupervised identifiers of all the substations are successfully updated to the latest parameters.

The analysis module obtaining module 19 is configured to obtain M marked data analysis modules and N unmarked data analysis modules after training is completed after multiple training iterations until a preset training round is satisfied.

Specifically, a preset training round, that is, the number of iterations is set, and for each round of training iterations, M labeled data analysis modules are trained using the labeled data, and each module performs forward propagation, calculation loss, reverse propagation and parameter update according to its own labeled data. The parameter optimization can be performed by adopting random gradient descent, adam and other optimization algorithms. And training N non-labeled data analysis modules by using non-labeled data, and training according to the selected non-supervision learning algorithm, such as clustering, dimension reduction, self-encoder and the like. For self-supervised learning, pseudo tags may be generated and trained with the pseudo tags. After each round of training is completed, the model is evaluated and a verification set may be used to check performance. And adjusting super parameters such as learning rate, regularization parameters and the like according to the evaluation result. For a non-annotated data analysis module, internal evaluation metrics, such as profile coefficients of a clustering algorithm, may be used to evaluate performance. When the preset training round is reached, the training is stopped. And obtaining M marked data analysis modules and N unmarked data analysis modules after training.

The data uploading module 20 is configured to perform edge calculation on the data collected by the K automobile substations based on the M labeled data analysis modules and the N unlabeled data analysis modules, and upload the calculation result to the management and control platform.

Specifically, M tagged data analysis modules and N untagged data analysis modules are deployed to respective automotive substations or ensure that the modules can access the collected data in a distributed manner. The raw data collected by each automotive substation is subjected to necessary preprocessing such as cleaning, denoising, formatting, etc., before edge calculation is performed. And carrying out feature extraction or conversion on the data according to the requirements of the data analysis module. And inputting the preprocessed data into the marked data analysis module. The module uses its trained model to process and analyze the data, such as classification, prediction, anomaly detection, etc. And inputting the preprocessed unmarked data into an unmarked data analysis module. And the module performs clustering, dimension reduction or other unsupervised learning tasks on the data according to the trained unsupervised learning model. Identifying patterns, structures or abnormal points in the data, and generating corresponding analysis results. And integrating the analysis results generated by each module. And the result data is compressed, encrypted or formatted according to the requirement, so that the transmission and storage are convenient. And establishing a safe communication connection between the automobile substation and the management and control platform. And uploading the integrated calculation result to a management and control platform through a network. Periodic uploading, real-time uploading, or event-triggered based uploading policies may be employed.

Further, the preset index set includes a vehicle sales index, a user feedback index, a market index, and a user behavior index.

Specifically, the vehicle sales index includes sales and sales, the sales measuring the number of sales of the vehicle over a period of time. Sales reflect the market share and sales capacity of the enterprise; the user feedback indexes comprise customer satisfaction, customer complaint rate, customer unsatisfactory expression proportion, repeated purchasing rate and customer acceptance rate, wherein the customer satisfaction is obtained through investigation, and the customer satisfaction degree of the customer on products and services, the customer complaint rate is used for reflecting the problem and pain point of the customer on the products and services, and the customer acceptance degree of the customer on the products and services of enterprises is reflected; the market indexes comprise market share, which means the proportion of sales of enterprises to sales of the whole market, reflect the competition status of the enterprises in the market, and also comprise market growth rate, which can reflect the overall growth trend and potential opportunities of the market; the user behavior indexes comprise user liveness, indexes for measuring the user participation degree, such as average residence time, access page number and the like, are used for evaluating the interest of the user to the product and the use frequency, and also comprise user viscosity, and mainly concern the continuous access condition of the user in a period of time, such as access frequency and access interval time, and reflect the dependence degree of the user to the product.

Further, as shown in fig. 2, the online bus and city bus substation data management and control platform based on edge calculation further includes a data deriving module, configured to:

determining a labeling target set and a data labeling template based on the preset index set; performing data preprocessing on the M automobile substation historical data sets to obtain M automobile substation processing historical data sets; and carrying out data annotation on the M automobile substation processing historical data sets based on the annotation target set and the data annotation template, and exporting the marked data according to a preset format to obtain the M annotation automobile substation historical data sets.

Specifically, sales data such as sales volume, sales amount, sales growth rate, etc., and corresponding units such as vehicles, ten thousand yuan, and time periods such as months, quarters, years, etc., explicitly need to be annotated. And (3) user feedback indexes, namely determining specific contents such as satisfaction degree scores, evaluation keywords, problem feedback and the like of user feedback, and setting a scoring standard or a classification system. Market metrics, identifying market-related data such as market share, competitor performance, market trend, etc., and determining how to quantify or describe these metrics. User behavior indexes define specific types of user behaviors, such as browsing times, click rates, conversion rates, residence times and the like, and determine the manner in which the behaviors are recorded. And selecting a proper data labeling tool to ensure that the data labeling tool can support various types of data labeling such as texts, numbers, charts and the like. And designing a data annotation template according to the preset index set, wherein the template comprises all fields and format requirements to be annotated. And cleaning the extracted data to remove repeated, invalid or erroneous data. The data are primarily classified and sorted according to the labeling requirements, so that the follow-up labeling work is facilitated. And marking the sales data according to preset fields such as sales volume, sales amount and the like, and filling in corresponding values and units. User feedback indicators mark satisfaction scores, using numerical or star-scale representations. And extracting and classifying and labeling the evaluation keywords, such as high cost performance, good service and the like. The problem feedback is recorded in detail and may need to be classified and labeled according to the type of the problem. The market share, competitor performance, etc. data is quantitatively annotated, such as using percentages to represent market shares. Market trends are noted descriptively, such as continuously increasing, slightly decreasing, etc. User behaviors, such as browsing, clicking, purchasing and the like, are counted and marked, and corresponding times or proportions are recorded. And labeling the time, place, equipment and other context information of the user behavior. And implementing quality control measures in the labeling process, such as periodical spot check of labeling results, setting of labeling accuracy targets and the like. And sorting the marked data according to a preset format to ensure the structuring and readability of the data. And exporting the formatted annotation data to form M annotation automobile substation historical data sets. And verifying the exported data, and ensuring the integrity and accuracy of the data.

Further, the online bus and city bus substation data management and control platform based on edge calculation further comprises a parameter output module for:

Dividing the M marked automobile substation historical data sets according to a preset dividing proportion to obtain M training sets and M testing sets; training the M supervised identifiers by using the M training sets, supervising the training process by using the M test sets, and updating network parameters according to supervision results until the model converges to obtain model parameters of the M supervised identifiers; and training the M marked non-supervision identifiers of the M marked automobile substations by using the M marked automobile substation historical data sets to obtain M marked non-supervision identifier parameters, wherein marks in the M marked automobile substation historical data sets do not participate in training.

Specifically, each marked automobile substation historical data set is divided according to a preset dividing proportion, for example, an 80% training set and a 20% testing set, so that M training sets and M testing sets are formed. Initial network parameters are set for the M supervised identifiers. And respectively training the M corresponding supervised identifiers by using the M training sets. In the training process, the predicted value is calculated through forward propagation, and then the predicted value is compared with the true label to calculate the loss function. The training process is supervised with M test sets. After each training period, the performance of the model is evaluated using the test set. And updating network parameters according to evaluation results such as accuracy, loss value and the like, and adopting optimization algorithms such as gradient descent and the like. Repeating the training and supervising steps until the model performance reaches a preset standard or is not obviously improved, namely model convergence. At this time, M model parameters labeled with the supervised identifiers are saved. Although M marked car substation historical data sets are used, in unsupervised learning, marked information does not participate in the training process. Thus, only the characteristic parts in the dataset are used. Initial parameters are set for the M label unsupervised identifiers. And training the M marked automobile substation historical data sets by using an unsupervised learning algorithm such as clustering, dimension reduction and the like. These algorithms learn representations or groupings of data based on their inherent structure and association. And updating the parameters of the M label unsupervised identifiers according to the unsupervised learning target and algorithm characteristics. When the preset iteration times, the convergence of the loss function and the like are reached, training is stopped, and parameters of M marked non-supervision identifiers are stored.

Further, the parameter output module is further configured to:

extracting features of the M marked automobile substation historical data sets to obtain M marked automobile substation historical feature sets; inputting the M marked automobile substation history feature sets into the M marked non-supervision identifiers for training to obtain M evaluation indexes of the M marked non-supervision identifiers; and performing recognizer parameter adjustment based on the M evaluation indexes, performing repeated iterative optimization until the evaluation indexes meet the preset requirements, and outputting parameters of the M trained marked non-supervision recognizers to obtain the parameters of the M marked non-supervision recognizers.

In particular, it is ensured that the M marked automobile substation history data sets have been subjected to appropriate pre-processing, including data cleaning, normalization or normalization, etc., to eliminate the effects of outliers and dimensional differences on feature extraction. And extracting features by utilizing a feature engineering technology according to each marked automobile substation historical data set. Including vehicle model, price, sales volume, user feedback keywords, market trend indicators, etc. The features with the most representativeness and predictive capability are selected from the extracted features to reduce feature dimensions and improve generalization capability of the model. The feature selection may be performed using correlation analysis, principal component analysis, or the like. After feature engineering and feature selection, M marked automobile substation historical feature sets are obtained. Initial parameters are set for the M label unsupervised identifiers. And respectively inputting the M marked automobile substation history feature sets into corresponding M marked non-supervision identifiers for training. Based on characteristics of unsupervised learning algorithms, such as clustering, dimension reduction, etc., the identifier will learn the inherent structure and association of the data. After training, M evaluation indexes are calculated through a preset evaluation method, such as profile coefficients, davies-Bouldin Index and the like, so that the performance of the unsupervised recognizer is quantified. The evaluation index of each unsupervised recognizer is carefully analyzed to identify performance bottlenecks and potential optimization space. Based on the evaluation result, the parameters of the unsupervised identifier are adjusted. Including learning rate, number of clusters, regularization strength, etc. And repeating the training, evaluation and parameter adjustment processes until the evaluation index meets the preset requirement or the maximum iteration number is reached. In each iteration, the evaluation index under different parameter combinations is recorded and compared to find the optimal parameter setting. And outputting the optimal parameters of the M label non-supervision recognizers after training when the evaluation indexes meet the preset requirements, and obtaining the parameters of the M label non-supervision recognizers.

Further, the online bus and city bus substation data management and control platform based on edge calculation further comprises a non-labeling non-supervision identifier parameter acquisition module for:

Respectively utilizing the N non-marked automobile substation historical data sets to perform self-supervision learning training on N non-marked supervised recognizers of the N non-marked automobile substations until preset conditions are met, outputting parameters of the N non-marked supervised recognizers after training, and obtaining N non-marked supervised recognizer parameters; and training the N non-labeling non-supervision identifiers of the N non-labeling automobile substations by using the N non-labeling automobile substation historical data sets respectively until the model converges, outputting the parameters of the N non-labeling non-supervision identifiers after training, and obtaining the parameters of the N non-labeling non-supervision identifiers.

Specifically, the N unlabeled automobile substation historical data sets are subjected to proper preprocessing, including data cleaning, standardization and the like. Because the data is unlabeled, a self-supervision learning task is designed. For example, the pseudo tag may be generated by a data enhancement technique. Initial parameters are set for the supervised identifiers of the N unlabeled automobile substations. And training the N unmarked supervised recognizers by using the designed self-supervision learning task and the pseudo tag. The predicted value is calculated by forward propagation, and the loss function is calculated by comparing with the pseudo tag. Model parameters are updated using an optimization algorithm, such as gradient descent. In the training process, whether preset stopping conditions are met or not is checked regularly, such as maximum iteration times are reached, the loss function converges or the performance of the verification set is not improved. And stopping training when the preset conditions are met, and outputting parameters of N non-labeled supervised identifiers after the training is finished, so as to obtain parameters of N non-labeled supervised identifiers. And preprocessing N unlabeled automobile substation historical data sets. Initial parameters are set for the non-supervised identifiers of the N non-annotated automobile substations. The N unlabeled non-supervised identifiers are trained using non-supervised learning algorithms, such as clustering, self-encoders, generation of countermeasure networks, and the like. The model is allowed to learn the intrinsic structure and representation of the data according to the characteristics of the selected algorithm. Loss functions, reconstruction errors, or other relevant indicators during the training process are monitored to determine whether the model converges. And stopping training when the model converges or the maximum iteration number is reached, and outputting parameters of N unlabeled non-supervision identifiers after training is completed to obtain parameters of N unlabeled non-supervision identifiers.

Further, the non-labeling non-supervision identifier parameter acquisition module is further configured to:

determining N proxy tasks, and carrying out parameter initialization setting on the N non-marked supervised identifiers based on the N proxy tasks; pre-training the N non-marked supervised identifiers by using the N non-marked automobile substation historical data sets, and evaluating training results by using non-supervision evaluation indexes to obtain N pre-training fitness; and fine tuning the parameters of the N non-labeled supervised identifiers based on the N pre-training fitness values until a preset condition is met, outputting the trained parameters of the N non-labeled supervised identifiers, and obtaining the parameters of the N non-labeled supervised identifiers, wherein the preset condition is that the pre-training fitness values meet a preset fitness threshold value.

Specifically, N proper agent tasks are determined according to the characteristics and the targets of the non-labeling automobile substation historical data set. These proxy tasks are tasks that are related to the final target and that can be learned from unlabeled data. And carrying out parameter initialization setting on the N non-marked supervised identifiers based on the determined N proxy tasks. The initial structure and parameters of the identifier are configured according to the requirements of the proxy task. Ensuring that the N unlabeled automobile substation historical data sets are subjected to proper pretreatment for training. N unlabeled supervised recognizers are pre-trained using N unlabeled auto substation historical data sets. During the pre-training process, each recognizer will attempt to resolve its corresponding proxy task. Depending on the nature of the agent task, a suitable unsupervised assessment index is selected to quantify the performance of the recognizer. These indicators include reconstruction errors, cluster quality, etc. And evaluating the pretraining results of the N non-labeled supervised identifiers by using the selected non-supervision evaluation indexes to obtain N pretraining fitness values. The pre-trained fitness values for each recognizer are carefully analyzed to determine if further fine tuning is needed. And fine tuning the parameters of the N non-labeled supervised identifiers based on the pre-training fitness value. This may involve adjusting network architecture, learning rate, or other super parameters. During the fine tuning process, the change in fitness value is continually monitored to ensure that the adjustment is effective. During the fine tuning process, it is checked periodically whether the pre-training fitness meets a preset fitness threshold. Once this threshold is reached or exceeded, the pre-training is considered successful. And stopping fine adjustment when the preset condition is met, namely the pre-training fitness meets a preset fitness threshold value, and outputting parameters of N supervised recognizers without labels after training is finished. These parameters will be used for subsequent tasks or further training.

Further, the online bus and city bus substation data management and control platform based on edge calculation further comprises a mean value calculation module for:

Inputting the M supervised identifier parameters and the N non-supervised identifier parameters into the server to obtain a supervised parameter labeling space, wherein the supervised parameter labeling space comprises K space points, and M+N=K; analyzing the K space points by using a box diagram method, and determining a first quartile and a third quartile; and carrying out mean value calculation on a plurality of space points between the first quartile and the third quartile to obtain the updated supervised identifier parameter.

Specifically, M annotated supervised identifier parameters and N unlabeled supervised identifier parameters are input into a server. In the server, these parameters are integrated into a supervised parameter space. This space contains K spatial points, where m+n=k, each representing a parameter set of the identifier. The K spatial points were analyzed using the box plot method. The box plot is a statistical plot used to display a set of data dispersion profiles, including maximum, minimum, median, and upper and lower quartiles. The first quartile, Q1, i.e. 25% of the quantiles and the third quartile, Q3, i.e. 75% of the quantiles, are determined by the box plot method. These two values divide the data set into four equal parts, each representing a different distribution interval of the data. A plurality of spatial points between the first quartile and the third quartile are selected from the supervised parameter labeling space. These points represent relatively concentrated and stable regions in the parameter space. The mean value is calculated for these selected spatial points. By averaging these points, a representative parameter set can be obtained that integrates the parameter characteristics of multiple identifiers. And taking the calculated mean value as an updated supervised identifier parameter. These parameters are derived based on statistical analysis of labeled and unlabeled recognizer parameters, with better generalization ability and robustness.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the present application and the equivalent techniques thereof, the present application is also intended to include such modifications and variations.

Claims

1. The online car market automobile substation data management and control platform based on edge computing is characterized by including:

A data extraction module is used to extract data from K bus sub-stations according to a preset indicator set within a preset historical window to obtain a historical data set of the K bus sub-stations;

An edge computing module, used to obtain K data analysis modules corresponding to the K bus substations, wherein the K data analysis modules are used to perform edge computing on the data collected by the K bus substations, and the K data analysis modules include K supervised identifiers and K unsupervised identifiers;

A data labeling module is used to randomly select M bus substation historical data sets from the K bus substation historical data sets for data labeling, and obtain M labeled bus substation historical data sets and N unlabeled bus substation historical data sets;

A bus substation module, used for obtaining M labeled bus substations and N unlabeled bus substations based on the mapping of the M labeled bus substation historical data sets and the N unlabeled bus substation historical data sets;

A first training module is used to train the M labeled supervised identifiers and the M labeled unsupervised identifiers of the M labeled bus substations respectively by using the M labeled bus substation historical data sets to obtain M labeled supervised identifier parameters and M labeled unsupervised identifier parameters;

The second training module is used to train the N unlabeled supervised identifiers and the N unlabeled unsupervised identifiers of the N unlabeled bus substations respectively using the N unlabeled bus substation historical data sets to obtain N unlabeled supervised identifier parameters and N unlabeled unsupervised identifier parameters;

A parameter distribution module, used for inputting the M labeled supervised identifier parameters and the N unlabeled supervised identifier parameters into the server of the management and control platform for parameter update, obtaining updated supervised identifier parameters, and distributing the updated supervised identifier parameters to the M labeled supervised identifiers of the M labeled bus substations and the N unlabeled supervised identifiers of the N unlabeled bus substations;

A parameter updating module, used for inputting the M labeled unsupervised identifier parameters and the N unlabeled unsupervised identifier parameters into the server of the management and control platform for parameter updating, obtaining updated unsupervised identifier parameters, and distributing the updated unsupervised identifier parameters to the M labeled unsupervised identifiers of the M labeled bus substations and the N unlabeled unsupervised identifiers of the N unlabeled bus substations;

The analysis module acquisition module is used to obtain M labeled data analysis modules and N unlabeled data analysis modules after multiple training iterations until the preset training rounds are met;

A data uploading module, used to perform edge computing on the data collected by the K automobile substations based on the M labeled data analysis modules and the N unlabeled data analysis modules, and upload the computing results to the management and control platform;

It also includes an unlabeled unsupervised identifier parameter acquisition module for:

Using the N unlabeled bus substation historical data sets respectively, perform self-supervised learning training on the N unlabeled supervised identifiers of the N unlabeled bus substations until a preset condition is met, and output the parameters of the trained N unlabeled supervised identifiers to obtain N unlabeled supervised identifier parameters;

Using the N unlabeled bus substation historical data sets respectively to train the N unlabeled unsupervised identifiers of the N unlabeled bus substations until the model converges, outputting the parameters of the trained N unlabeled unsupervised identifiers to obtain the parameters of the N unlabeled unsupervised identifiers;

Wherein, the unlabeled unsupervised identifier parameter acquisition module is also used for:

Determine N proxy tasks, and perform parameter initialization setting for the N unlabeled supervised identifiers based on the N proxy tasks;

Pre-training the N unlabeled supervised identifiers using the N unlabeled bus substation historical data sets, and evaluating the training results using unsupervised evaluation indicators to obtain N pre-training fitnesses;

Based on the N pre-trained fitnesses, the parameters of the N unlabeled supervised identifiers are fine-tuned until a preset condition is met, and the parameters of the N unlabeled supervised identifiers that have been trained are output to obtain N unlabeled supervised identifier parameters, wherein the preset condition is that the pre-trained fitness satisfies a preset fitness threshold.

2. The online car market automobile sub-station data management and control platform based on edge computing as described in claim 1 is characterized in that the preset indicator set includes vehicle sales indicators, user feedback indicators, market indicators and user behavior indicators.

3. The online car market automobile substation data management and control platform based on edge computing as claimed in claim 1, characterized in that it also includes a data export module for:

Determine a labeling target set and a data labeling template based on the preset indicator set;

Performing data preprocessing on the M bus substation historical data sets to obtain M bus substation processed historical data sets;

The M bus substation processing historical data sets are annotated based on the annotated target set and the data annotation template, and the annotated data are exported according to a preset format to obtain the M annotated bus substation historical data sets.

4. The online car market automobile substation data management and control platform based on edge computing as claimed in claim 1, characterized in that it also includes a parameter output module for:

The M annotated automobile substation historical data sets are divided according to a preset division ratio to obtain M training sets and M test sets;

Using the M training sets to train the M labeled supervised identifiers, and using the M test sets to supervise the training process, and updating network parameters according to the supervision results until the model converges, thereby obtaining the model parameters of the M labeled supervised identifiers;

The M labeled bus substation historical data sets are used to train the M labeled unsupervised identifiers of the M labeled bus substations to obtain M labeled unsupervised identifier parameters, wherein the labels in the M labeled bus substation historical data sets do not participate in the training.

5. The online car market automobile substation data management and control platform based on edge computing according to claim 4, characterized in that the parameter output module is also used for:

Performing feature extraction on the M annotated bus substation historical data sets to obtain M annotated bus substation historical feature sets;

Inputting the M annotated bus substation historical feature sets into the M annotated unsupervised identifiers for training, and obtaining M evaluation indicators of the M annotated unsupervised identifiers;

Based on the M evaluation indicators, the parameters of the identifier are adjusted. After multiple iterations of optimization, until the evaluation indicators meet the preset requirements, the parameters of the M labeled unsupervised identifiers that have been trained are output to obtain the parameters of the M labeled unsupervised identifiers.

6. The online car market automobile substation data management and control platform based on edge computing according to claim 1 is characterized in that it also includes a mean calculation module for:

Inputting the M labeled supervised identifier parameters and the N unlabeled supervised identifier parameters into the server to obtain a labeled supervised parameter space, wherein the labeled supervised parameter space includes K spatial points, M+N=K;

Analyze the K spatial points using a box plot method to determine the first quartile and the third quartile;

The updated supervised identifier parameters are obtained by calculating the mean of multiple spatial points between the first quartile and the third quartile.