CN118962455A

CN118962455A - A battery analysis method

Info

Publication number: CN118962455A
Application number: CN202411045141.5A
Authority: CN
Inventors: 陈军; 陈猛
Original assignee: Dongguan Quanchuangda Energy Co ltd
Current assignee: Dongguan Quanchuangda Energy Co ltd
Priority date: 2024-08-01
Filing date: 2024-08-01
Publication date: 2024-11-15
Anticipated expiration: 2044-08-01
Also published as: CN118962455B

Abstract

The application discloses a battery analysis method, which relates to the field of energy storage batteries and comprises the following steps: acquiring data of a battery; extracting key features from the preprocessed data; constructing a combined feature according to the key feature; clustering the working states of the batteries by adopting a DBSCAN clustering algorithm according to the average current, the capacity change rate and the temperature change rate to obtain a working mode division result of the batteries; counting the transition frequency among different working modes according to the working modes obtained by clustering, and calculating a transition probability matrix; constructing a neural network-based fault analysis model, taking a working mode division result, key features and combined features as input, taking a fault state of a battery as output, and training the fault analysis model by adopting a supervised learning algorithm; and inputting the battery data to be processed into a trained fault analysis model, and predicting the fault probability of the battery. Aiming at the low accuracy of battery fault analysis in the prior art, the application improves the accuracy of predicting the fault probability of the battery.

Description

Battery analysis method

Technical Field

The application relates to the technical field of energy storage batteries, in particular to a battery analysis method.

Background

With the rapid development of new energy technology, batteries have become a core component in the fields of various electronic devices, electric automobiles, energy storage systems and the like. The reliability and safety of the battery are directly related to the operational stability and service life of the entire system. However, the battery often faces complex and variable working environments and load conditions in practical application, and various faults and performance degradation problems, such as overcharge, overdischarge, overheat, short circuit and the like, are easy to occur. These faults not only lead to reduced battery performance and shortened service life, but also may cause safety accidents, causing serious economic loss and social impact. Therefore, the working state of the battery is timely and accurately analyzed, the potential fault risk is predicted, and the method has important significance for guaranteeing the reliable operation and the safe maintenance of the battery system.

Existing battery fault analysis methods mainly include methods based on empirical models, data driving and machine learning. The method based on the experience model is used for predicting the possibility of fault occurrence by establishing an equivalent circuit model or an electrochemical model of the battery and analyzing internal parameters and state changes of the battery. But such methods typically rely on expert experience and a priori knowledge and are difficult to accommodate for the variety and complexity of the battery operating environment. The data driving method utilizes various monitoring data collected in the battery operation process, such as voltage, current, temperature and the like, and discovers hidden fault modes and rules in the data through statistical analysis and data mining technology. However, battery operation data often has noise and redundancy, and limited data quality and feature expressive power, which can affect the accuracy of the fault analysis. Based on a machine learning method, such as a support vector machine, a decision tree, a neural network and the like, a mapping relation between a battery state and a fault type is established by learning a large amount of historical data, so that intelligent diagnosis and prediction of faults are realized. However, these methods often consider the battery as a static system, neglect the dynamic evolution characteristics of the battery state, and have difficulty in capturing the time-sequence dependency of the battery failure, resulting in low prediction accuracy.

In the related art, for example, chinese patent document CN117347869a provides a method, apparatus, electronic device and medium for analyzing data of an energy storage battery management system. According to the application, the plurality of initial data of the energy storage battery are acquired from the energy storage battery management system, and the plurality of target data with higher quality and accuracy are acquired from the plurality of initial data. And obtaining average current data according to the battery current data, and accurately identifying the working mode of the energy storage battery according to the average current data and the battery capacity change rate after obtaining the battery capacity change rate according to the battery working time data and the battery capacity data, so that a pre-trained fault analysis model is utilized to be based on the working mode and a plurality of target data. But the scheme utilizes a pre-trained fault analysis model to perform fault analysis based on the operating mode and the target data. However, the working environment and the load condition of the battery are complex and changeable, and the pre-trained model may be difficult to adapt to various situations in practical application, and the generalization capability is insufficient, so that the fault analysis accuracy is reduced.

Disclosure of Invention

Aiming at the problem of low accuracy of battery fault analysis in the prior art, the application provides a battery analysis method, which improves the accuracy of predicting the fault probability of a battery by carrying out cluster analysis on the working states of the battery, counting the transition probability among the working modes of the battery, solving the steady-state distribution of a Markov chain and the like.

Technical solution the object of the present application is achieved by the following technical solution.

The application provides a battery analysis method, which comprises the following steps: s1, acquiring data of a battery from an energy storage battery management system BMS, and preprocessing the acquired data; wherein the data comprises current, voltage, temperature, capacity, cycle number and accumulated use time; the preprocessing comprises data cleaning, outlier processing and data normalization; s2, extracting key features from the preprocessed data; the key characteristics comprise average current, capacity change rate, temperature change rate, cycle times and accumulated use time; s3, constructing a combined feature according to the key feature; wherein, the combined characteristic comprises an interaction item of average current and temperature and a ratio of capacity change rate to cycle number; s4, clustering the working states of the batteries by adopting a DBSCAN clustering algorithm according to the average current, the capacity change rate and the temperature change rate to obtain a working mode division result of the batteries; s5, counting transition frequencies among different working modes according to the working modes obtained by clustering, and calculating a transition probability matrix, wherein the transition probability matrix is used for representing the probability of transition of the battery among the different working modes; s6, constructing a fault analysis model based on a neural network, taking a working mode dividing result, key features and combined features as input, taking a fault state of a battery as output, and training the fault analysis model by adopting a supervised learning algorithm; the fault analysis model comprises at least one of a logistic regression model, a decision tree model, a support vector machine model and a neural network model. S7, inputting the battery data to be processed into the trained fault analysis model, and predicting the fault probability of the battery.

The average current is the average value of the charge and discharge current of the battery in a certain time period. It reflects the average operating state and current level of the battery over the period. The average working strength of the battery can be known by calculating the average current, and whether the battery is in abnormal states such as overcharge and overdischarge can be judged. The capacity change rate represents the degree of change in the battery capacity from the initial capacity. It may be calculated by dividing the difference between the current capacity and the initial capacity by the initial capacity. The rate of change of capacity reflects the state of health and the degree of aging of the battery. The capacity gradually decays along with the use of the battery, and the rate of change of the capacity can quantify the decay trend, so that a basis is provided for evaluating the residual life of the battery. The rate of change of temperature represents the rate of change of the battery temperature over time. It can be obtained by calculating the amount of change in temperature per unit time. The rate of temperature change reflects the thermal stability and heat dissipation properties of the battery. An excessively high temperature change rate may mean that the battery has problems such as abnormal heat generation, poor heat dissipation, and the like, and needs to be emphasized. DBSCAN (Density-Based Spatial Clustering ofApplications with Noise) is a Density-based clustering algorithm. It may divide the data set into clusters and noise points according to the density relationship of the sample points. In the battery working state clustering, the DBSCAN algorithm can automatically find out the typical working mode of the battery according to the characteristics of average current, capacity change rate and the like. The operation mode of the battery refers to a typical operation state and behavior mode of the battery during actual use. Different modes of operation may correspond to different combinations of current, voltage, temperature, etc. characteristics. By performing cluster analysis on the battery operating state, a main operating mode of the battery, such as a normal operating mode, an overcharge mode, an overdischarge mode, and the like, can be found and defined. The division of the working modes is helpful for understanding the actual use condition of the battery, and provides basis for fault diagnosis and health management. In supervised learning, training data consists of input features and corresponding real labels. The algorithm continuously adjusts model parameters by minimizing the error between the predicted value and the real label to fit the mapping relationship of the training data. Common supervised learning algorithms include decision trees, random forests, support vector machines, neural networks, and the like. In battery fault analysis, a supervised learning algorithm may be used, with the operating mode, key features, etc. as inputs, the fault state as outputs, to train a fault analysis model.

Further, S3, constructing combined features according to the key features; wherein the combination characteristic comprises an interaction term of average current and temperature and a ratio of capacity change rate to cycle number, and the combination characteristic comprises: performing product operation on the average current and the temperature to obtain interaction item data of the average current and the temperature; dividing the capacity change rate by the cycle number to obtain ratio data of the capacity change rate and the cycle number; and combining the interaction item data of the average current and the temperature and the ratio data of the capacity change rate and the cycle number to generate a combined characteristic.

And performing product operation on the average current and the temperature to obtain a new characteristic. This interaction term captures the interaction between current and temperature. Typically, the operating current and temperature of the battery are interrelated. Higher currents may cause temperature increases, which in turn affect the charge and discharge performance of the battery. By constructing an interaction term of current and temperature, the influence of the combined action of current and temperature on the health state of the battery can be better described. Dividing the capacity change rate by the number of cycles gives a new feature. This ratio feature characterizes the rate of change of capacity per unit cycle. The capacity change of the battery is closely related to the number of cycles. As the number of cycles increases, the battery capacity typically gradually decays. But the rate of capacity fade may vary from cell to cell. By calculating the ratio of the capacity change rate to the cycle number, the influence of the cycle number can be eliminated, and the speed of the capacity decay of the battery can be reflected more directly, so that the health state of the battery can be estimated better. The two combined features are combined with the original key features, so that a feature set with more comprehensive and more expressive ability can be obtained. The combined features provide information of interaction relations and proportional relations among the key features, and are helpful for mining potential patterns and rules in the data. In subsequent battery operation mode clustering and fault analysis model construction, the combined features can be used as input features to participate in calculation and modeling together with the key features. By introducing the combination features, the division precision of the clustering algorithm on the working mode and the prediction capability of the fault analysis model on the battery health state can be improved.

Further, S4, clustering the working states of the battery by using a clustering algorithm according to the average current and the capacity change rate, to obtain a working mode dividing result of the battery, including: s41, acquiring average current, capacity change rate and temperature change rate from the key features acquired in the step S2, and combining the average current, capacity change rate and temperature change rate into three-dimensional data points to form input data of a clustering algorithm; s42, setting weight coefficients of the mean current, the capacity change rate and the temperature change rate by a characteristic weight calculation method; calculating weighted Euclidean distances among the three-dimensional data points according to the weight coefficients; s43, setting a neighborhood radius epsilon and a neighborhood density threshold MinPts of a DBSCAN clustering algorithm; s44, for each input data point, calculating the number of data points in the corresponding epsilon-neighborhood; if the number of data points in epsilon-neighborhood of the input data point is more than or equal to MinPts, marking the corresponding input data point as a core point; s45, recursively dividing core points and non-core points in the epsilon-neighborhood into the same cluster by taking each core point as a starting point; dividing non-core points into clusters where core points closest to the non-core points are located; marking input data points that are not divided into any clusters as noise points; s46, according to the clustering division result, taking the clustering label to which each input data point belongs as a corresponding working mode class, taking the noise point as an abnormal working state, and outputting the working mode division result of the battery.

Specifically, determination of core point (step S44): when determining whether a data point is a core point, the number of data points within the epsilon-neighborhood of the data point needs to be calculated. And determining whether one data point is within epsilon-neighborhood of another data point is accomplished by comparing the weighted euclidean distance between two data points to the size of the neighborhood radius epsilon. If the weighted Euclidean distance between two data points is less than or equal to ε, they are considered to belong to ε -neighbors of each other.

Cluster formation (step S45): in the process of recursively dividing core points and non-core points in epsilon-neighborhood into the same cluster by taking the core points as starting points, the neighborhood relation among the data points is also required to be judged by using the weighted Euclidean distance. For a core point, all data points in the epsilon-neighborhood (including core points and non-core points) are partitioned into clusters where the core point is located. The determination of whether a data point is within epsilon-neighborhood of the core point is also accomplished by comparing the weighted euclidean distance between them to epsilon.

Home of non-core point (step S45): for non-core points, it is necessary to divide them into clusters where the closest core point is located. The "closest" is determined here by comparing the weighted euclidean distance between the non-core points and the respective core points. The non-core points are partitioned into clusters where the core points with the smallest weighted euclidean distance are located.

Further, S5, according to the operation modes obtained by clustering, counting transition frequencies between different operation modes, and calculating a transition probability matrix, where the transition probability matrix is used to characterize a probability of transition of the battery between the different operation modes, including: s51, counting the transfer frequency among different working modes according to the working mode division result of the battery, and generating a transfer frequency matrix A; wherein matrix element a _ij represents the frequency of transition from operation mode i to operation mode j; s52, generating a transition probability matrix P according to the transition frequency matrix A; wherein the element p_ij in the transition probability matrix P represents the probability of transition from the operation mode i to the operation mode j; for each row in the transition frequency matrix, dividing the row by the sum of elements of the row to obtain transition probability; s53, regularizing the generated transition probability matrix P to obtain a regularized transition probability matrix P _norm; s54, solving the steady-state distribution pi of the Markov chain according to the regularized transition probability matrix P _norm to obtain the steady-state distribution of the battery under different working modes; s55, outputting a transition probability matrix P _norm and steady-state distribution pi of the battery in different working modes.

The transition probability matrix is obtained by counting transition frequencies among different working modes and carrying out probability calculation and regularization treatment, and represents probability characteristics of battery transition among different working modes; the steady-state distribution reflects the long-term stability probability distribution of the battery in different working modes, and provides probability basis for subsequent battery fault analysis. Describing the dynamic characteristics of battery working mode transfer through a Markov chain model, and solving steady-state distribution by adopting a characteristic value decomposition or numerical optimization method according to the property of a state transfer matrix. When the Markov chain meets the stable distribution condition, the accurate steady-state distribution can be directly solved; when the stable distribution condition is not met, the steady distribution is estimated by adopting an optimization method, so that the applicability and the robustness of steady distribution calculation are improved. The calculation of the steady state distribution provides long term probability information for battery fault analysis.

Further, S54, according to the regularized transition probability matrix P _norm, solving a steady-state distribution pi of the markov chain to obtain a steady-state distribution of the battery in different working modes, including: taking the regularized transition probability matrix P _norm as a state transition matrix of a Markov chain, and constructing a Markov chain model for battery working mode transition; performing eigenvalue decomposition on a state transition matrix P _norm of the Markov chain, and judging whether the state transition matrix meets a stable distribution condition, wherein the stable distribution condition is that the maximum eigenvalue is unique and equal to 1; if the stable distribution condition is met, solving a corresponding left eigenvector according to the state transition matrix P _norm, and normalizing the left eigenvector to obtain the stable distribution pi of the Markov chain; if the stable distribution condition is not met, estimating the stable distribution pi of the Markov chain by adopting a numerical optimization method; the steady-state distribution pi of the markov chain refers to that after a state transition of the markov chain for a long enough time, the probability distribution of each state converges to a stable probability distribution. The steady state distribution pi satisfies the equation pi' =pi×p, where P is the state transition matrix of the markov chain. In steady state distribution, the transition probabilities of the Markov chain between states reach equilibrium and no longer change over time. In the Markov chain model of battery operation mode transition, the steady-state distribution pi represents the long-term stability probability distribution of the battery in different operation modes. By solving the steady-state distribution pi, the relative frequency and importance of the battery in each operating mode can be known.

The steady distribution condition means that the state transition matrix P of the markov chain satisfies a certain mathematical property, so that the markov chain has a unique steady-state distribution pi. The smooth distribution conditions generally include two requirements: the maximum eigenvalue of the state transition matrix P is unique and equal to 1; the other eigenvalues of the state transition matrix P are all modulo less than 1. Markov chains meeting a smooth distribution condition are referred to as traversed markov chains, which ensure that the markov chain converges to a unique steady state distribution pi after a sufficiently long period of state transition has passed. By judging whether the state transition matrix P satisfies the steady distribution condition, it can be determined whether the markov chain of the battery operation mode transition has steady distribution. The left eigenvector refers to an eigenvector corresponding to the transpose matrix P ^T of the state transition matrix P. For eigenvalue λ, the left eigenvector v satisfies the equation vP ^T =λv. When the state transition matrix P satisfies the steady distribution condition, the left eigenvector (normalized) corresponding to the maximum eigenvalue 1 is the steady state distribution pi of the markov chain. The steady-state distribution pi of the markov chain can be directly obtained by solving the left eigenvector of the state transition matrix P and normalizing it.

The numerical optimization method refers to a method for solving an optimization problem through numerical calculation and an iterative algorithm. In the steady-state distribution estimation of the Markov chain, if the state transition matrix P does not meet the steady-state distribution condition, the steady-state distribution pi cannot be directly solved, and the estimation can be performed by adopting a numerical optimization method. Common numerical optimization methods include gradient descent, quasi-newton, interior point, and the like. These methods gradually approach the true steady-state distribution pi by iteratively updating the estimates. In each iteration, the optimization direction and step size are calculated from the current estimate and the optimization objective function (e.g., minimizing the difference between the estimate and the true distribution), and the estimate is updated until a convergence condition is reached. The numerical value optimization method provides a flexible and universal steady-state distribution estimation scheme, and is suitable for the situation that the state transition matrix P does not meet the steady distribution condition.

Preferably, S1, when the state transition matrix P does not meet the stable distribution condition, estimates the stable distribution pi of the markov chain by using the quasi-newton method, including: initializing an estimated value pi ₀ of a steady-state distribution, and generally selecting a uniform distribution or a random distribution as an initial value; defining an optimized objective function f (pi) for measuring the difference between the estimated value pi and the real steady-state distribution, wherein the common objective function comprises KL divergence, euclidean distance and the like; setting an upper limit max_iter and a convergence threshold tol of the iteration times as stopping conditions of the quasi-Newton method; an approximate Hessian matrix H ₀ of quasi-newton method is initialized, typically with either an identity matrix or a diagonal matrix.

S2, in each iteration, updating the estimated value of the steady-state distribution by using a quasi-Newton method until a stopping condition is reached, wherein the method comprises the following steps: calculating the gradient g _k at the current estimated value pi _k, namely the gradient of the objective function f (pi) to pi _k; calculating a search direction d _k by using an updated formula of quasi-Newton method, wherein d _k＝-H_k ^-1×g_k,H_k is a current approximate Hessian matrix; a line search strategy is adopted, and a proper step length a _k is found along a search direction d _k, so that an objective function value f (pi _k+a_kd_k) is reduced; updating the estimated value of the steady-state distribution, pi _k+1＝π_k+α_k×d_k, to obtain a new estimated value pi _k+1; using pi _k、π_k+1、g_k and g _k+1, updating the approximate Hessian matrix by using correction formulas (such as BFGS, DFP and the like) of the quasi-Newton method to obtain H _k+1; judging whether a stopping condition is met, namely whether the iteration times reach max_iter or the variation of the estimated value is smaller than a convergence threshold value tol, if so, stopping iteration, otherwise, continuing iteration.

And S3, after iteration is ended, taking the final estimated value pi _k as an estimated result of the steady-state distribution of the Markov chain to obtain estimated steady-state distribution pi _est. The problem of estimating the steady-state distribution is converted into an optimization problem by introducing an optimization objective function f (pi), the objective being to minimize the difference between the estimated value and the true distribution. And optimizing by adopting a quasi-Newton method, and updating an estimated value in each iteration by utilizing gradient information and an approximate Hessian matrix to gradually approximate to real steady-state distribution. And selecting a proper step length through a line search strategy, finding out a new estimated value capable of reducing the objective function value in the search direction, and improving the optimization efficiency. And updating the approximate Hessian matrix in the iterative process by using a correction formula (such as BFGS, DFP and the like) of the quasi-Newton method, so as to gradually improve the approximation accuracy of the real Hessian matrix and accelerate the convergence rate. And setting an upper limit of iteration times and a convergence threshold value as stopping conditions, and controlling the termination of the optimization process to avoid meaningless excessive iteration. By estimating the steady-state distribution by using the quasi-newton method, an approximate solution of the steady-state distribution of the markov chain can be obtained when the state transition matrix P does not satisfy the steady-state distribution condition. The estimated steady-state distribution pi _est provides an estimated value of long-term steady probability distribution of the battery in different working modes, and provides important prior information for subsequent battery health state analysis and prediction.

Further, estimating a steady-state distribution of the Markov chain by using a numerical optimization method comprises the following steps: constructing an optimization objective function, wherein the optimization objective is to minimize the difference between the state distribution and the state transition matrix product; the state distribution is the steady-state distribution pi of the Markov chain to be estimated, and the state transition matrix is the state transition matrix P _norm of the constructed Markov chain; calculating a gradient of the steady-state distribution based on the optimized objective function; iteratively updating the steady-state distribution pi by adopting a gradient descent algorithm; setting an iteration termination condition, and taking the current steady-state distribution pi ^k as an estimated value of the steady-state distribution of the Markov chain after the iteration is terminated.

Further, the expression of the optimization objective function is as follows:

the expression for optimizing the objective function is as follows:

minimizeJ(π)＝||π-πP_norm||²+λ||π-π₀||²+μ||π-π_f||²

Pi ₀ is the prior distribution of the battery fault mode and is set according to historical fault data; pi _f is the ideal distribution of the fault modes of the electric lambda cell, and represents the expected fault mode distribution, and is set according to the performance and the safety requirement of the battery; and mu is a balance factor and respectively controls the weights of the prior distribution item and the ideal distribution item.

Specifically, the first term pi-pi P _norm||² in the objective function represents the difference between the product of the estimated steady-state distribution pi and the current state transition matrix P _norm. Minimizing this term allows the estimated steady-state distribution to be as consistent as possible with the current state transition characteristics, i.e., ensures consistency of the estimation results with the current observed battery operating state transition rules. Meanwhile, the second term λ||pi-pi ₀||² introduces the influence of the prior distribution pi ₀, representing the difference between the estimated steady-state distribution and the prior distribution. Minimizing this term allows the estimation to take into account historical experience to some extent, avoiding overfitting the current observation. The balance between the current state transfer characteristic and the history experience can be achieved by adjusting the weights of the two terms through the balance factor lambda, and the robustness of steady-state distribution estimation is improved.

The weighting factor lambda controls the magnitude of the influence of the prior distribution, and the operation is carried out:

λ||π-π||₀ ²＝λ[(π₁-π₀₁)²+(π₂-π₀₂)²+......+(π_n-π_0n)²), Wherein pi ₀ is n-dimensional prior division

And (5) cloth vector. This term calculates the sum of squares of the differences of pi and pi ₀ over each element and multiplies by the weighting factor lambda.

Specifically, μ||pi-pi _f||² represents the weighted Euclidean distance square between the steady state distribution pi and the ideal failure mode distribution pi _f. Pi-pi _f represents the difference vector between the steady-state distribution pi and the ideal failure mode distribution pi _f. It calculates the difference between pi and pi _f on each element, resulting in an n-dimensional vector representing the element-by-element difference between the two distributions. Pi-pi _f||² represents the square of the euclidean norm (L2 norm) of the vector.

In particular, the third term μ β -pi _f||² in the objective function introduces the effect of the ideal distribution pi _f, representing the difference between the estimated steady-state distribution and the preset ideal distribution. By minimizing this term, the estimated steady-state distribution can be made as close as possible to an ideal distribution while satisfying the current state transition characteristics and historical experience. The ideal distribution can be set according to the expectations of the state of health of the battery or expert knowledge, and the expectations of the long-term stable working state of the battery are reflected. And the introduction of ideal distribution constraint is beneficial to guiding a steady-state distribution estimation result to a healthier and more stable state, and improves the accuracy of battery fault prediction.

For vector x= (x ₁,x₂,......,x_n), the square calculation formula of its euclidean norm is: It calculates the sum of squares of the elements of the vector, representing the length or size of the vector. μ is a positive real number representing a weighting or balancing factor for controlling the weight or importance of the term in the overall optimization objective function. μ pi-pi _f||² represents the weighted square sum of the difference between the steady-state distribution pi and the ideal failure mode distribution pi _f. It measures the magnitude of the difference between pi and pi _f on each element and amplifies the effect of the larger difference by squaring. The weighting factor mu controls the importance of the term in the overall optimization objective function. μ pi-pi _f||² as a regularization term to guide the steady state distribution estimation approach to the ideal failure mode distribution. By minimizing this term, the estimated steady-state distribution can be made as close as possible to the expected failure mode distribution, thereby improving the accuracy of failure diagnosis and prediction.

Further, the gradient of the steady-state distribution is calculated by the following formula:

Wherein, I is an n×n dimension identity matrix, and P _norm ^T is a transpose matrix of the regularized state transition matrix P _norm; specifically, on the basis of the original gradient term, the gradient term of the difference between steady-state distribution and ideal fault mode distribution is added, and the gradient direction is influenced by the transfer matrix, the prior distribution and the ideal distribution.

Further, the steady-state distribution update is updated by the following formula: Wherein v ^k represents a momentum term of the kth iteration, beta is a momentum factor, and the influence of the history gradient is controlled; alpha _k represents the adaptive learning rate for the kth iteration, and the learning rate is adaptively adjusted using AdaGrad, RMSprop or Adam optimization algorithms. Specifically, in the iterative updating process of the gradient descent algorithm, a dynamic term and a self-adaptive learning rate are introduced, so that the convergence speed and stability are further optimized.

Further, the iteration termination condition includes any one of the following:

k≥K_max

|J(π^k+1)-J(π^k)|≤ε

||π^k+1-π^k||≤ε

||πP_norm-π||≤ε_p

||π-π_f||≤ε_f

Wherein K _max is a preset maximum number of iterations, ε is a preset convergence threshold. When any condition is met, terminating the iterative process, and taking the current steady-state distribution estimated value pi ^k as an estimated result of the steady-state distribution of the Markov chain; epsilon _p is the threshold for battery failure mode transition stability and epsilon _f is the similarity threshold for steady state distribution to ideal failure mode distribution.

Compared with the prior art, the application has the advantages that:

By preprocessing the battery data and extracting key features, noise and redundant information in the data are effectively eliminated, the data quality is improved, and a foundation is laid for subsequent analysis. Meanwhile, combination features such as an interaction term of average current and temperature, a ratio of capacity change rate to cycle number and the like are introduced, interaction among different features is fully considered, feature expression capability is enhanced, and fault analysis accuracy is improved.

The DBSCAN clustering algorithm is adopted to carry out self-adaptive clustering on the working state of the battery, and the limitation of manually setting the clustering quantity in the traditional method is overcome. By setting the neighborhood radius epsilon and the neighborhood density threshold MinPts, the clustering structure is automatically identified, the battery working mode is found, and the rationality and the interpretability of the clustering result are improved. The weighted Euclidean distance is introduced, different characteristics are given with weight coefficients, the influence of key characteristics is highlighted, and the clustering effect is improved.

Based on the clustering result, counting the transfer frequency among different working modes, constructing a transfer probability matrix, and describing the dynamic transfer characteristic of the working state of the battery. Further, by solving the steady-state distribution of the Markov chain, the long-term stability probability distribution of the battery in different working modes is obtained, the health evolution rule of the battery is revealed, and important prior information is provided for fault prediction.

Aiming at the situation that the transition probability matrix does not meet the stable distribution condition, a numerical optimization method is adopted to estimate the stable distribution, so that the applicability and the robustness of the stable distribution estimation are improved. By constructing an optimization objective function comprising a priori distribution item and an ideal distribution item and introducing a balance factor, the current state transfer characteristic is fitted, and meanwhile, the history experience and the expected target are considered, so that the steady-state distribution estimation is more comprehensive and reliable.

In the iterative optimization process of estimating steady-state distribution, a self-adaptive learning rate adjustment strategy, such as AdaGrad, RMSprop or Adam, is adopted to dynamically adjust the step length of each iteration, so that the convergence speed is increased. Meanwhile, diversified iteration termination conditions are introduced, the convergence of the optimization process is comprehensively evaluated, the estimation accuracy is ensured, and meanwhile, the calculation efficiency is improved.

The clustering result, the key features and the combined features are input into a fault analysis model based on the neural network, and a complex mapping relation between the working state and the fault state of the battery is established by utilizing the strong nonlinear fitting capability of the neural network. The model is trained through a supervised learning algorithm, the model parameters are adjusted in a self-adaptive mode, and the generalization capability and the prediction accuracy of the fault analysis model are improved.

Drawings

FIG. 1 is an exemplary flow chart of a battery analysis method according to some embodiments of the application;

FIG. 2 is an exemplary flow chart for acquiring combined features according to some embodiments of the application;

FIG. 3 is an exemplary flow chart for obtaining a battery operating mode partitioning result according to some embodiments of the present application;

FIG. 4 is an exemplary flow chart of outputting a probability transition matrix based on clustering results, according to some embodiments of the application;

FIG. 5 is an exemplary flow chart for solving steady-state distributions of Markov chains, according to some embodiments of the application.

Detailed Description

The method and system provided by the embodiment of the application are described in detail below with reference to the accompanying drawings.

Fig. 1 is an exemplary flow chart of a battery analysis method according to some embodiments of the application, comprising: acquiring data of a battery from an energy storage Battery Management System (BMS), and preprocessing the acquired data; wherein the data comprises current, voltage, temperature, capacity, cycle number and accumulated use time; extracting key features from the preprocessed data; the key characteristics comprise average current, capacity change rate, temperature change rate, cycle times and accumulated use time; constructing a combined feature according to the key feature; wherein, the combined characteristic comprises an interaction item of average current and temperature and a ratio of capacity change rate to cycle number; clustering the working states of the batteries by adopting a DBSCAN clustering algorithm according to the average current, the capacity change rate and the temperature change rate to obtain a working mode division result of the batteries; according to the working modes obtained by clustering, counting the transition frequency among different working modes, and calculating a transition probability matrix, wherein the transition probability matrix is used for representing the probability of transition of the battery among different working modes; constructing a neural network-based fault analysis model, taking a working mode division result, key features and combined features as input, taking a fault state of a battery as output, and training the fault analysis model by adopting a supervised learning algorithm; and inputting the battery data to be processed into a trained fault analysis model, and predicting the fault probability of the battery.

Acquiring data of a battery from an energy storage Battery Management System (BMS), and preprocessing the acquired data; wherein the data comprises current, voltage, temperature, capacity, cycle number and accumulated use time; specifically, by establishing communication connection with the energy storage battery management system BMS, real-time monitoring data of the battery is periodically obtained from the BMS. The BMS adopts a communication protocol such as CAN bus or RS485 and the like to send the monitoring data of the battery to an external system at a fixed frequency (such as 1 Hz). And receiving battery monitoring data sent by the BMS, analyzing the data frame, extracting key data fields, and comprising the following steps: current flow: the charging and discharging current of the battery is expressed, the unit is ampere (A), the precision is 0.1A, and the data range is-1000A to 1000A; voltage: representing the port voltage of the battery in volts (V), with an accuracy of 0.1V and a data range of 0V to 5V; temperature: the surface temperature of the battery is expressed, the unit is the temperature (DEG C), the precision is 0.1 ℃, and the data range is-50 ℃ to 100 ℃; capacity: representing the current residual capacity of the battery, wherein the unit is ampere hour (Ah), the precision is 0.1Ah, and the data range is 0Ah to 1000Ah; cycle times: the number of charge and discharge cycles of the battery is represented, the unit is times, the precision is 1 time, and the data range is 0 to 10000 times; accumulating the service time: the accumulated working time of the battery is expressed, the unit is hours (h), the precision is 0.1h, and the data range is 0h to 50000h.

The battery monitoring data extracted by analysis is preprocessed, and the method specifically comprises the following steps: data cleaning: detecting and eliminating abnormal values and invalid values, such as data exceeding the range, data mutation and the like; data normalization: mapping data of different dimensions to the same dimension, such as mapping the data to a [0,1] interval by adopting maximum and minimum normalization; smoothing data: smoothing the data by adopting methods such as moving average, exponential smoothing and the like, so as to reduce data fluctuation and noise; and (3) data sampling: sampling the data according to fixed time intervals (such as 1 minute) to obtain time sequence data with equal intervals; the preprocessed battery monitoring data is converted into a standardized data format, such as a JSON or CSV format, and stored in a local database or memory as input for subsequent data analysis.

Extracting key features from the preprocessed data; the key characteristics comprise average current, capacity change rate, temperature change rate, cycle times and accumulated use time; key features are extracted from the pre-processed battery monitoring data according to a fixed time window (e.g., 1 hour). The selection of the time window requires balancing the amount of data and computational efficiency while taking into account the time scale of the battery performance variation. For the data within each time window, the following key features are calculated: mean current: the arithmetic average of the battery current over a time window is represented, reflecting the average charge-discharge intensity of the battery. The calculation formula is as follows: i_avg= (Σi_i)/N, where i_avg is the average current, i_i is the current value of the I-th sampling point, and N is the number of sampling points in the time window. Capacity change rate: the relative rate of change of the battery capacity over the time window is indicated, reflecting the rate of decay of the battery capacity. The calculation formula is as follows: c_rate= (c_end-c_start)/c_start, where c_rate is the rate of change of capacity, c_start is the battery capacity at the beginning of the time window, and c_end is the battery capacity at the end of the time window. Rate of temperature change: the relative change rate of the battery temperature in the time window is represented, reflecting the change trend of the battery temperature. The calculation formula is as follows: t_rate= (t_end-t_start)/t_start, where t_rate is the rate of change of temperature, t_start is the battery temperature at the beginning of the time window, and t_end is the battery temperature at the end of the time window. Cycle times: the number of charge and discharge cycles of the battery at the end of the time window is indicated, reflecting the life cycle of the battery. And directly extracting the circulation times value from the monitoring data at the end time of the time window. Accumulating the service time: the accumulated working time of the battery at the end of the time window is represented, and the service time of the battery is reflected. The accumulated usage time value is directly extracted from the monitoring data at the end of the time window. And organizing the extracted key features into time sequence data according to the sequence of time windows, wherein each time window corresponds to a group of key features to form a feature matrix. Each row of the feature matrix represents a feature vector for a time window, and each column represents the value of a key feature over a different time window. And (3) carrying out standardization processing on the feature matrix, mapping feature values with different orders of magnitude to similar scales, and improving the comparability of the features. Common normalization methods include maximum and minimum normalization, zero mean unit variance normalization, and the like. And storing the standardized feature matrix into a local database or a memory to be used as input for subsequent data analysis.

FIG. 2 is an exemplary flow chart for obtaining combined features according to some embodiments of the application, constructing combined features from key features; the combined characteristic comprises an interaction item of average current and temperature and a ratio of capacity change rate to cycle number, and the combined characteristic reflecting the battery performance is constructed based on the extracted key characteristic. The combined features are obtained by carrying out mathematical operation on a plurality of key features, so that the interaction relation and the relative strength between the key features can be captured, and richer battery performance information is provided. And constructing an interactive characteristic of average current and temperature, and reflecting the combined influence of the current and the temperature on the battery performance. The calculation formula is as follows: it_interaction=i_avg t_avg, where it_interaction is an interaction term of average current and temperature, i_avg is average current, and t_avg is an arithmetic average of battery temperature in a time window. The physical meaning of the interaction term is that current and temperature are two key factors affecting battery performance, and there is an interaction between them. Higher currents can accelerate the heating of the battery, resulting in increased temperatures; the higher temperature accelerates the aging of the battery and reduces the battery capacity. The interaction term is able to quantitatively describe the effect of the combined action of current and temperature.

And constructing a ratio characteristic of the capacity change rate to the cycle number, and reflecting the battery capacity decay speed under the unit cycle number. The calculation formula is as follows: cc_ratio=c_ratio/cycle_count, where cc_ratio is the ratio of the capacity change rate to the cycle number, c_ratio is the capacity change rate, and cycle_count is the cycle number. The physical significance of the ratio feature is that the decay of battery capacity is closely related to the number of cycles, but the rate of capacity decay may vary from battery to battery. By calculating the rate of change of capacity per cycle, the capacity fade characteristics of different batteries can be normalized and batteries with abnormal capacity fade can be identified. And adding the interaction item characteristic (IT_interaction) of the average current and the temperature and the ratio characteristic (CC_ratio) of the capacity change rate and the cycle number into the generated characteristic matrix to form a new combined characteristic matrix. And carrying out standardization processing on the combined feature matrix, mapping the interaction item and the ratio feature to a scale similar to the key feature, and improving the comparability of the feature. And storing the standardized combined feature matrix into a local database or a memory, and taking the standardized combined feature matrix and the generated key feature matrix as follow-up input data.

Fig. 3 is an exemplary flowchart for obtaining a result of dividing an operation mode of a battery according to some embodiments of the present application, where clustering the operation states of the battery according to a mean current and a capacity change rate by using a clustering algorithm to obtain the result of dividing the operation mode of the battery includes: three features of average current (I_avg), capacity change rate (C_rate) and temperature change rate (T_rate) are extracted from the generated key feature matrix and combined into three-dimensional data points (I_avg, C_rate and T_rate) to form an input data set of a clustering algorithm. And setting weight coefficients w_ I, w _C and w_T of the mean current, the capacity change rate and the temperature change rate by adopting a characteristic weight calculation method, and reflecting the importance degree of different characteristics on the working state of the battery. The weight coefficient may be set according to expert experience or a data analysis result, satisfying w_i+w_c+w_t=1.

Based on the weight coefficients, a weighted euclidean distance d_ij between any two data points (x_i, y_i, z_i) and (x_j, y_j, z_j) in the input data set is calculated as a similarity measure of the clustering algorithm. The calculation formula is as follows: d_ij=sqrt (w_i (x_i-x_j)/(2+w_c) (y_i-y_j)/(2+w_t) (z_i-z_j)/(2), and a DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise) algorithm is selected to cluster the input data set, and cluster clusters are partitioned according to the spatial Density of the data points. Two key parameters of the DBSCAN algorithm are set: a neighborhood radius epsilon and a neighborhood density threshold MinPts. Epsilon defines the neighborhood of data points, representing the hypersphere area centered on the data point and with epsilon as the radius; minPts defines a condition of a core point, meaning that the number of data points in the neighborhood is at least MinPts. For each data point in the input dataset, the number of data points in its ε -neighborhood is calculated. If the number of data points contained in the epsilon-neighborhood of the data point is greater than or equal to MinPts, the data point is marked as a core point. With each core point as a starting point, the data points with which the density can be achieved are recursively divided into the same cluster. For core points, dividing all core points and non-core points in epsilon-neighborhood into the same cluster; for non-core points, the non-core points are divided into clusters where the core points closest to the non-core points are located. Data points that are not partitioned into any clusters are marked as noise points, representing abnormal operating conditions. And according to the cluster division result, taking the cluster label to which each input data point belongs as the working mode category of the battery in the corresponding time window, and generating a battery working mode division result.

Specifically, a DBSCAN clustering algorithm is adopted to perform clustering analysis on the working state of a certain battery to obtain a working mode division result of the battery, wherein the obtained key feature data are shown in Table 1:

TABLE 1 Key characterization data

Sampling sequence number	I_avg (average current)	C_rate (capacity change rate)	T_rate (rate of temperature change)
				1	0.25	-0.02	0.05
2	0.18	-0.01	0.03
				3	0.32	-0.03	0.08
4	0.05	-0.05	0.01
				5	0.28	-0.02	0.06
6	0.20	-0.01	0.04
				7	0.35	-0.04	0.09
8	0.08	-0.06	0.02
				9	0.30	-0.03	0.07
10	0.22	-0.02	0.05

Combining the I_avg (mean current), the C_rate (capacity change rate) and the T_rate (temperature change rate) into three-dimensional data points to form an input data set of a DBSCAN clustering algorithm, and setting the weight coefficients of the mean current, the capacity change rate and the temperature change rate as follows: w_i=0.5, w_c=0.3, w_t=0.2, satisfy w_I+w/u c+w_t=1. And calculating the weighted Euclidean distance between any two data points in the input data set according to the weight coefficient. The neighborhood radius epsilon=0.05 of the DBSCAN clustering algorithm is set, and the neighborhood density threshold minpts=3. For each input data point, the number of data points in its ε -neighborhood is calculated. Taking the 1 st data point (0.25, -0.02, 0.05) as an example, the epsilon-neighborhood contains the 5 th data point (0.28, -0.02,0.06) and the 10 th data point (0.22, -0.02, 0.05), the number of data points is 2, less than MinPts, so the 1 st data point is not the core point. And recursively carrying out cluster division by taking each core point as a starting point. It can be found by calculation that the 3rd data point (0.32, -0.03,0.08), the 7 th data point (0.35, -0.04,0.09) and the 9 th data point (0.30, -0.03,0.07) are core points, and the data points in epsilon-neighborhood of the core points can be divided into the same cluster. The 4th data point (0.05, -0.05,0.01) and the 8 th data point (0.08, -0.06,0.02) do not belong to epsilon-neighborhood of any core point, and are marked as noise points. The remaining data points are partitioned into corresponding clusters according to distance from the nearest core point. According to the clustering division result, the working mode division result of the output battery is shown in table 2, wherein 1 represents a first normal working mode, 2 represents a second normal working mode, and-1 represents an abnormal working state (noise point):

Table 2 results of operation mode division of battery

And obtaining steady-state distribution of the battery according to the working mode obtained by clustering, and working according to the obtained battery. Fig. 4 is an exemplary flowchart of outputting a probability transition matrix according to a clustering result according to some embodiments of the present application, and counting transition frequencies between different operation modes according to a mode division result to generate a transition frequency matrix a. The battery has m modes of operation, a is an m x m dimensional matrix, and the matrix element a_ij represents the number of times the battery has been transferred from mode i to mode j in a given data set. And calculating a transition probability matrix P according to the transition frequency matrix A. For the element p_ij in the matrix P, the calculation formula is: p_ij=a_ij/Σ_ja_ij, wherein Σ_ja_ij represents the sum of the elements of the i-th row in the matrix a, i.e. the total transfer frequency of the battery in the operation mode i. This step normalizes the transition frequency to a transition probability, ensuring Σ_jp_ij=1 for each operation mode i.

And regularizing the generated transition probability matrix P to obtain a regularized transition probability matrix P_norm. The regularization process includes: for each element p_ij in P, a small normal number epsilon is added, resulting in p_ij' =p_ij+epsilon. Therefore, the transition probability is prevented from being 0, and the numerical stability is improved. Normalize each row of P, ensure Σjp_ij' =1. And normalizing to obtain a regularized transition probability matrix P_norm.

FIG. 5 is an exemplary flow chart for solving steady-state distributions of a Markov chain, according to some embodiments of the application, using a regularized transition probability matrix P_norm as a state transition matrix for the Markov chain to construct a Markov chain model for battery mode transition. And then, judging whether a stable distribution condition is met or not by carrying out eigenvalue decomposition on the state transition matrix, and solving the stable distribution of the Markov chain according to the decomposition result. The battery has m working modes, and the regularized transition probability matrix P_norm is an m multiplied by m dimensional matrix. Taking p_norm as the state transition matrix of the markov chain means that the transition of the battery between different operation modes satisfies the markov property, i.e. the operation mode at the next moment is only related to the operation mode at the current moment and is not related to the previous operation mode. The state transition matrix P _ norm is subjected to eigenvalue decomposition, resulting in eigenvalues lambda _1, lambda _2, the term lambda m and the corresponding feature vector v_1, v_2, the term, v_m. According to the Perron-Frobenius theorem, if p_norm is an irreducible non-negative matrix, its maximum eigenvalue λ_1 is unique and all components of the corresponding eigenvector v_1 are positive numbers.

Judging whether the maximum eigenvalue lambda_1 is unique and equal to 1 or not to determine whether the state transition matrix P_norm meets the stable distribution condition or not: if the maximum eigenvalue λ_1 is unique and equal to 1, the state transition matrix p_norm satisfies the smooth distribution condition. This means that the markov chain has a unique steady state distribution pi and is independent of the initial state distribution. If the maximum eigenvalue λ_1 is not unique or equal to 1, the state transition matrix p_norm does not satisfy the smooth distribution condition. In this case, the state distribution of the Markov chain may be periodically changed or converged to a plurality of different distributions.

When the state transition matrix p_norm satisfies the steady distribution condition, the steady distribution pi of the markov chain can be obtained by solving the left eigenvector of the state transition matrix. The left eigenvector pi satisfies the following equation: pi_norm=pi, where pi is a1×m-dimensional row vector representing the steady state distribution probability of the markov chain at m states. The left eigenvector pi can be solved using a numerical algorithm such as a power method, an inverse power method, or a Krylov subspace method. The left eigenvector pi obtained by solving may not be normalized, and needs to be normalized so as to meet the probability distribution condition of Σi pi_i=1. The normalized left eigenvector is the steady-state distribution pi of the Markov chain. Through the steps, whether the Markov chain for transferring the battery working mode has steady-state distribution or not can be judged, and the steady-state distribution pi is solved. The steady-state profile pi represents the average duty cycle or time ratio of the battery in each operating mode during long-term operation. The information can reflect the actual working state and the use characteristics of the battery, and provides important basis for the evaluation of the state of health and the prediction of the residual life of the battery.

In the Markov chain model for battery operation mode transition, if the state transition matrix P_norm does not meet the stable distribution condition, a numerical optimization method can be adopted to estimate the stable distribution pi of the Markov chain. The specific implementation mode is as follows: an optimization objective function J (pi) is constructed, which represents the difference between the state distribution and the state transition matrix product, and the difference between the prior distribution and the ideal distribution: j (pi) = -pi p_norm ++λ1/-pi 0/, the first term pi-pi p_norm ++2 represents the difference between the state distribution pi and the state transition matrix p_norm product, when pi satisfies the steady state distribution condition of the markov chain. The second term λ -pi 0 represents the difference between the estimated steady-state distribution pi and the prior distribution pi 0, and introducing the prior distribution can improve the stability and robustness of the estimation. The a priori distribution pi 0 may be set based on battery historical operating data or expert experience. The third term, μ -pi f, represents the difference between the estimated steady-state distribution pi and the ideal distribution pi f, and introducing the ideal distribution can guide the estimation result to optimize toward the desired direction. The ideal distribution pi f can be set according to battery performance and safety requirements.

Calculating the gradient of the steady-state distributionThe gradient calculation formula is: (I-P_norm) +2λ (pi-pi 0) +2μ (pi-pi f), wherein I is a unitary matrix and P_norm≡T is a transposed matrix of the state transition matrix P_norm. And adopting a gradient descent algorithm, and iteratively updating the steady-state distribution pi to reduce the optimization objective function value. The update formula is: Pi (k+1) =pi k-alpha-kv (k+1), where v k represents the velocity term of the kth iteration, β is a momentum factor, controlling the influence of the history gradient; alpha k represents the learning rate of the kth iteration, either a constant learning rate or an adaptive learning rate (e.g., adaGrad, RMSprop, adam, etc.) may be employed.

Setting an iteration termination condition, and terminating the iteration process when any one of the following conditions is met: the iteration times reach a preset maximum iteration times K_max; the change of the objective function value is smaller than a preset threshold epsilon, namely |J (pi (k+1)) -J (pi (k))|epsilon; the state distribution change is smaller than a preset threshold epsilon, namely pi (k+1) -pi k pi/epsilon is smaller than or equal to epsilon; the difference between the state distribution and the transfer matrix product is smaller than a preset threshold epsilon_p, namely, pi P_norm-pi is less than or equal to epsilon_p; the difference between the state distribution and the ideal distribution is smaller than a preset threshold epsilon_f, namely pi-pi f/f is less than or equal to epsilon_f. The state distribution at the time of terminating the iteration is taken as an estimated value of the steady state distribution of the Markov chain. By introducing prior distribution and ideal distribution and combining gradient descent algorithm, the steady-state distribution of the Markov chain can be estimated when the state transition matrix does not meet the steady distribution condition. The numerical value optimization method comprehensively considers the consistency, priori knowledge and expected characteristics of the state distribution and the transfer matrix, and can obtain more stable and reasonable estimation results.

Specifically, according to the battery working mode division result identified in table 2, the transfer frequency between different working modes is counted, and a transfer frequency matrix a is generated, see table 3:

TABLE 3 transfer frequency matrix A

	Mode-1	Mode 1	Mode 2
				Mode-1	0	1	1
Mode 1	1	2	2
				Mode 2	1	2	1

As can be seen from the data, there is a transition in the battery between the different modes of operation. For example, a transition from mode-1 to mode 1 and mode 2 occurs 1 time each, a transition from mode 1 to mode-1 occurs 1 time, a transition within mode 1 occurs 2 times, a transition from mode 1 to mode 2 occurs 2 times, and so on.

From the transition frequency matrix a, a transition probability matrix P is generated, see table 4:

TABLE 4 transition probability matrix P

The transition probability matrix P represents the probability of the battery transitioning from one mode of operation to another. For example, when the battery is in mode-1, the probability of transitioning to both mode 1 and mode 2 is 0.5; when the battery is in mode 1, there is a probability that the probability of 0.2 transitioning to mode-1,0.4 remains in mode 1,0.4 and the probability of transitioning to mode 2; and so on. And regularizing the transition probability matrix P to obtain a regularized transition probability matrix P_norm. The regularization process can ensure that the sum of elements in each row of the transition probability matrix is 1, and the probability distribution requirement is met. And taking the regularized transition probability matrix P_norm as a state transition matrix of the Markov chain to construct a Markov chain model for battery working mode transition. And carrying out eigenvalue decomposition on the state transition matrix P_norm: lambda_1=1.0000, lambda_2= 0.2463, lambda_3= -0.2413, and the maximum eigenvalue lambda_1 is unique and equal to 1, satisfying the smooth distribution condition. Solving a left eigenvector corresponding to the state transition matrix P_norm, and normalizing to obtain the steady-state distribution pi of the Markov chain: pi= [0.1982,0.4059,0.3959]. The steady-state distribution of the battery in the 3 operating modes is respectively: mode-1: 19.82%; mode 1:0.59%; mode 2:39.59%.

And outputting a regularized transition probability matrix P_norm and steady-state distribution pi of the battery in different working modes. In another embodiment, the transfer frequency matrix a is shown in table 5:

TABLE 5 transfer frequency matrix A

	Mode 1	Mode 2	Mode 3
				Mode 1	12	5	3
Mode 2	6	18	4
				Mode 3	2	3	9

From the transition frequency matrix a, a transition probability matrix P is calculated, see table 6:

TABLE 6 transition probability matrix P

	Mode 1	Mode 2	Mode 3
				Mode 1	0.6	0.25	0.15
Mode 2	0.2142	0.6428	0.143
				Mode 3	0.143	0.2142	0.6428

Regularization treatment is carried out on the transition probability matrix P, and a regularized transition probability matrix P_norm is obtained, wherein the regularized transition probability matrix P_norm is shown in a table 7:

TABLE 7 transition probability matrix P_norm after regularization

Taking the regularized transition probability matrix P_norm as a state transition matrix of a Markov chain; and carrying out eigenvalue decomposition on the state transition matrix P_norm to obtain eigenvalues: λ_ 1=1.0000, λ_ 2=1.0000, λ_ 3=0.4000, and the maximum eigenvalue is not unique, and the smooth distribution condition is not satisfied. Because the stable distribution condition is not satisfied, estimating the stable distribution pi of the Markov chain by adopting a numerical optimization method; adjusting the prior distribution pi 0 and the ideal distribution pi f: pi 0= [0.35,0.40,0.25], pi f= [0.25,0.45,0.30]. Adjusting optimization algorithm parameters: maximum number of iterations k_max=1500, convergence threshold epsilon=1e-7, balance factor λ=0.15, μ=0.25, momentum factor β=0.85, initial learning rate α=0.008; iteratively updating the steady-state distribution pi by using a gradient descent algorithm, calculating the gradient of the optimized objective function, and obtaining an estimated value of the steady-state distribution according to an iteration termination condition: pi_est= [0.2837,0.4421,0.2742]. Since the maximum eigenvalue is still not unique, the numerical optimization method can be continued to estimate the steady-state distribution.

Constructing a neural network-based fault analysis model, taking a working mode division result, key features and combined features as input, taking a fault state of a battery as output, and training the fault analysis model by adopting a supervised learning algorithm; specifically, training data is prepared: combining the working mode division result, the key features and the combined features into an input feature vector; collecting fault state information of the battery, such as normal, mild fault, severe fault and the like, as an output label; and forming the training data set by the input characteristic vector and the corresponding output label. Characteristic pretreatment: carrying out standardization or normalization treatment on the input features to ensure that the scales of the input features are consistent; and feature selection or dimension reduction is carried out according to the requirements, redundant or irrelevant features are removed, and the generalization performance of the model is improved.

Selecting a machine learning algorithm suitable for battery fault analysis, such as decision trees, random forests, support vector machines, neural networks and the like; constructing a neural network-based fault analysis model, such as: decision tree model: adjusting parameters such as maximum depth of the tree, minimum sample number of leaf nodes, partition quality measurement and the like; random forest model: parameters such as the number of the trees, the maximum feature number, the sample sampling proportion and the like are adjusted; support vector machine model: selecting different kernel functions (such as linear kernel and Gaussian kernel), and adjusting regularization parameters and kernel function parameters; neural network model: different network structures (such as layer number, neuron number of each layer), selection of activation functions, regularization method and the like are designed. Dividing the training data set into a training set and a verification set by adopting a k-fold cross verification or a leave-out method; training the constructed fault analysis model by using training set data, and optimizing model parameters by using a supervised learning algorithm; evaluating the performance of the model on a verification set, such as indexes of accuracy, precision, recall, F1 score and the like; and further adjusting the structure and parameters of the model according to the verification result so as to improve the generalization performance of the model. Integrating a plurality of trained fault analysis models, such as strategies of voting, averaging, weighted averaging and the like; the integrated model is generally more robust and generalizing than a single model. Evaluating the performance of the integrated fault analysis model on an independent test set to verify the effectiveness thereof in practical applications; the erroneous samples of the model are analyzed to identify possible directions of improvement, such as introducing new features, optimizing the model structure, etc.

Specifically, the application adopts a random forest model as a machine learning model for battery fault analysis, and comprises the following steps: collecting and sorting historical operating data of the battery, including characteristic data (such as current, voltage, temperature, etc.) and corresponding fault tag data; preprocessing the data, such as outlier removal, missing value processing, data normalization and the like; the data set is divided into a training set, a validation set and a test set. Analyzing and selecting key characteristics related to battery faults, such as average current, capacity change rate, temperature change rate and the like; and performing operations such as feature combination, feature conversion and the like to construct a more effective feature representation. Determining basic parameters of a random forest, such as the number of decision trees, the maximum depth of each tree, the minimum number of samples of leaf nodes, the characteristic sampling proportion and the like; randomly extracting a sample subset from the training set by using a Bootstrap sampling method, and constructing a plurality of decision trees; when each decision tree is constructed, a random feature subset (generally selecting the square root of the total feature number) is adopted to divide nodes, so that the randomness and diversity of the model are improved; in the growth process of the decision tree, pruning operation is not performed, so that each tree grows to the maximum depth as much as possible. Training a random forest model by using training set data, and predicting fault labels of samples by using an integration strategy (such as majority voting) of a decision tree; evaluating the performance of the model on the verification set, and measuring the classification effect of the model by using indexes such as accuracy, precision, recall, F1 score and the like; the super parameters (such as the number of decision trees, the maximum depth, the minimum number of samples of leaf nodes and the like) of the random forest are adjusted through grid search or random search and other methods, and the optimal parameter combination is found; and retraining the random forest model by using the adjusted super parameters to obtain a final battery fault analysis model. Evaluating the performance of the trained random forest model on an independent test set, and verifying the generalization capability of the random forest model on unknown data; analyzing the misclassified samples of the model, identifying possible improvement directions, such as introducing new features, adjusting sample weights, and the like; and calculating the importance of the features, evaluating the contribution degree of each feature to battery fault prediction, and selecting the feature subset with the most distinguishing property. Deploying the trained random forest model into an actual battery fault analysis system, and carrying out real-time fault prediction on newly acquired battery data; and periodically collecting new battery working data and fault labels, and retraining and updating the model to adapt to the change of the battery service environment and the new fault type.

And inputting the battery data to be processed into a trained fault analysis model, and predicting the fault probability of the battery. Specifically, acquiring battery data to be processed, such as time sequence data of current, voltage, temperature and the like; and carrying out preprocessing operations such as denoising and outlier removal on the data so as to enable the data to be consistent with the format of the model training data. Extracting key features and combined features which are the same as the model training data based on the preprocessed battery data; the extracted features are combined into feature vectors consistent with the model input. Inputting the extracted feature vector into a trained working mode division model (such as a clustering model); and obtaining the dividing result of the battery data in different working modes. Inputting the working mode dividing result, the extracted key features and the combined features into a trained fault analysis model; for the integrated fault analysis model, respectively inputting the characteristics into each sub-model to obtain the prediction result of the sub-model; and synthesizing the prediction results of the submodels according to an integration strategy (such as a voting method, an averaging method and a weighted averaging method) to obtain a final fault probability prediction value. Outputting the predicted battery fault probability as a final result of a battery analysis method; and a threshold value of the fault probability can be set, whether the battery has fault risk or not is judged according to the predicted fault probability, and a corresponding health state evaluation result is given. The battery fault probability prediction result is visually displayed in the forms of charts, reports and the like, so that a user can intuitively understand the health state of the battery; and generating a battery health state analysis report, which comprises the contents of fault probability, health state evaluation, potential risk factor analysis and the like, and providing support for subsequent maintenance decisions.

Claims

1. A battery analysis method, comprising:

S1, obtain battery data from the energy storage battery management system BMS and pre-process the collected data; the data includes current, voltage, temperature, capacity, number of cycles and cumulative usage time;

S2, extracting key features from the preprocessed data; wherein the key features include mean current, capacity change rate, temperature change rate, number of cycles and cumulative usage time;

S3, constructing a combined feature based on the key features; wherein the combined feature includes an interaction term between the mean current and the temperature, and a ratio of the capacity change rate to the number of cycles;

S4, clustering the working state of the battery using the DBSCAN clustering algorithm according to the mean current, capacity change rate and temperature change rate, and obtaining the working mode division result of the battery;

S5, according to the working modes obtained by clustering, counting the transition frequencies between different working modes, and calculating the transition probability matrix, which is used to characterize the probability of the battery switching between different working modes;

S6, construct a fault analysis model based on a neural network, take the working mode division results, key features and combined features as input, take the battery fault status as output, and use a supervised learning algorithm to train the fault analysis model;

S7, inputting the battery data to be processed into the trained fault analysis model to predict the failure probability of the battery.

2. The battery analysis method according to claim 1, characterized in that:

S3, constructing a combined feature based on the key features; wherein the combined feature includes the interaction term between the mean current and the temperature, and the ratio of the capacity change rate to the number of cycles, including:

The average current and temperature are multiplied to obtain the interaction term data of the average current and temperature;

Divide the capacity change rate by the number of cycles to obtain the ratio of the capacity change rate to the number of cycles;

The interaction data of average current and temperature, and the ratio data of capacity change rate and cycle number are combined to generate a combined feature.

3. The battery analysis method according to claim 2, characterized in that:

S4, according to the mean current, capacity change rate and temperature change rate, the DBSCAN clustering algorithm is used to cluster the working state of the battery to obtain the battery working mode division results, including:

S41, obtaining the mean current, capacity change rate and temperature change rate from the key features obtained in step S2, and combining them into three-dimensional data points to constitute the input data of the DBSCAN clustering algorithm;

S42, setting weight coefficients of the mean current, capacity change rate, and temperature change rate by a feature weight calculation method; and calculating the weighted Euclidean distance between the three-dimensional data points according to the weight coefficients;

S43, setting the neighborhood radius ε and neighborhood density threshold MinPts of the DBSCAN clustering algorithm;

S44, for each input data point, calculate the number of data points in the corresponding ε-neighborhood according to the weighted Euclidean distance; if the number of data points in the ε-neighborhood of the input data point is greater than or equal to MinPts, mark the corresponding input data point as a core point;

S45, taking each core point as a starting point, recursively dividing the core points and non-core points in the ε-neighborhood into the same cluster; wherein the non-core points are divided into the cluster where the core point with the closest weighted Euclidean distance is located; and the input data points that are not divided into any cluster are marked as noise points;

S46, according to the clustering result, the clustering label to which each input data point belongs is used as the corresponding working mode category, the noise point is used as the abnormal working state, and the working mode division result of the battery is output.

4. The battery analysis method according to claim 3, characterized in that:

S5, according to the working modes obtained by clustering, the transition frequencies between different working modes are counted, and the transition probability matrix is calculated. The transition probability matrix is used to characterize the probability of the battery switching between different working modes, including:

S51, according to the result of the battery working mode division, counting the transfer frequencies between different working modes, and generating a transfer frequency matrix A; wherein the matrix element a _ij represents the frequency of transfer from working mode i to working mode j;

S52, generating a transition probability matrix P according to the transition frequency matrix A; wherein the element p _ij in the transition probability matrix P represents the probability of transitioning from working mode i to working mode j;

S53, performing regularization processing on the generated transfer probability matrix P to obtain a regularized transfer probability matrix P _norm ;

S54, solving the steady-state distribution π of the Markov chain according to the regularized transfer probability matrix P _norm , and obtaining the steady-state distribution of the battery in different working modes;

S55, outputting the transfer probability matrix P _norm and the steady-state distribution π of the battery under different working modes.

5. The battery analysis method according to claim 4, characterized in that:

S54, according to the regularized transfer probability matrix P _norm , solve the steady-state distribution π of the Markov chain to obtain the steady-state distribution of the battery in different working modes, including:

The regularized transition probability matrix P _norm is used as the state transition matrix of the Markov chain to construct a Markov chain model for battery working mode transition.

Perform eigenvalue decomposition on the state transfer matrix P _norm of the Markov chain to determine whether the state transfer matrix meets the stationary distribution condition. The stationary distribution condition is that the maximum eigenvalue is unique and equal to 1;

If the stationary distribution condition is met, the corresponding left eigenvector is solved according to the state transfer matrix P _norm , and the left eigenvector is normalized to obtain the steady-state distribution π of the Markov chain;

If the stationary distribution condition is not met, a numerical optimization method is used to estimate the steady-state distribution π of the Markov chain.

6. The battery analysis method according to claim 5, characterized in that:

Numerical optimization methods are used to estimate the steady-state distribution of Markov chains, including:

Construct an optimization objective function, the optimization objective is to minimize the difference between the product of the state distribution and the state transfer matrix; wherein the state distribution is the steady-state distribution π of the Markov chain to be estimated, and the state transfer matrix is the state transfer matrix P _norm of the constructed Markov chain;

Based on the optimization objective function, the gradient of the steady-state distribution is calculated;

Adopt the gradient descent algorithm to iteratively update the steady-state distribution π;

Set the iteration termination condition. After the iteration is terminated, the current steady-state distribution π ^k is used as the estimated value of the Markov chain steady-state distribution.

7. The battery analysis method according to claim 6, characterized in that:

The expression of the optimization objective function is as follows:

min imizeJ(π)＝||π-πP _norm || ² +λ||π-π ₀ || ² +μ||π-π _f || ²

Among them, _π0 is the prior distribution of battery failure mode, which is set according to historical failure data; _πf is the ideal distribution of battery failure mode, which represents the expected failure mode distribution and is set according to battery performance and safety requirements; λ and μ are balancing factors, which control the weights of the prior distribution term and the ideal distribution term, respectively.

8. The battery analysis method according to claim 7, characterized in that:

The gradient of the steady-state distribution is calculated using the following formula:

Wherein, I is the n×n dimensional identity matrix, and P _norm ^T is the transposed matrix of the regularized state transfer matrix P _norm .

9. The battery analysis method according to claim 8, characterized in that:

The steady-state distribution is updated by the following formula:

π ^(k+1) =π ^k -α _k ×v ^(k+1)

Where ^vk represents the momentum term of the kth iteration, β is the momentum factor, which controls the influence of the historical gradient; _αk represents the adaptive learning rate of the kth iteration, and the AdaGrad, RMSprop or Adam optimization algorithm is used to adaptively adjust the learning rate.

10. The battery analysis method according to claim 9, characterized in that:

The iteration termination conditions include any of the following:

k≥K _max

|J(π ^k+1 )-J(π ^k )|≤ε

||π ^k+1 -π ^k ||≤ε

||πP _norm -π||≤ε _p

||π-π _f ||≤ε _f

Among them, K _max is the preset maximum number of iterations, ε is the preset convergence threshold; when any iteration termination condition is met, the iteration process is terminated, and the current steady-state distribution estimate π ^k is used as the estimated result of the Markov chain steady-state distribution; ε _p is the threshold of the stability of the battery failure mode transfer, and ε _f is the similarity threshold between the steady-state distribution and the ideal failure mode distribution.