CN118262844A

CN118262844A - Integrated learning-based high-hardness high-entropy alloy component design method

Info

Publication number: CN118262844A
Application number: CN202410448959.5A
Authority: CN
Inventors: 周国治; 李超; 栾俊; 于之刚
Original assignee: Zhejiang Weixiang Material Technology Co ltd; University of Shanghai for Science and Technology
Current assignee: Zhejiang Weixiang Material Technology Co ltd; University of Shanghai for Science and Technology
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-06-28

Abstract

The invention relates to the technical field of high entropy alloy hardness prediction, and more specifically to a high hardness high entropy alloy composition design method based on integrated learning. The design method comprises the following steps: obtaining a data set for predicting the hardness of the high entropy alloy based on a high entropy alloy system, then self-encoding the composition data in the initial data set, obtaining the hardness distribution latent space of the high entropy alloy, sampling the latent space using a Gaussian mixture distribution model, obtaining sample points involved in the prediction, then modeling different feature combinations of multiple machine learning, constructing multiple models, training and evaluating the models, screening out qualified models and then forming an integrated model, combining multiple machine learning models to form an integrated model by using an integrated learning method, predicting the participating sample points, then preparing and testing the alloy, and finally realizing the design of the high hardness high entropy alloy composition.

Description

A method for designing high-hardness and high-entropy alloy composition based on ensemble learning

技术领域Technical Field

本发明涉及高熵合金硬度预测技术领域，更具体地说是一种基于集成学习的高硬度高熵合金成分设计方法。The invention relates to the technical field of high entropy alloy hardness prediction, and more specifically to a high-hardness high entropy alloy composition design method based on integrated learning.

背景技术Background technique

高熵合金是一类由五种或五种以上的主要元素以相近的原子百分比组成的合金。这些合金因其独特的微观结构和优异的物理、化学以及机械性能而受到广泛关注。相比传统合金，高熵合金展现出诸如高硬度、优良的耐腐蚀性和良好的高温性能等特性，使得它们在航空航天、军事、汽车制造以及能源领域具有潜在的应用价值。High entropy alloys are a class of alloys composed of five or more main elements in similar atomic percentages. These alloys have attracted widespread attention due to their unique microstructures and excellent physical, chemical and mechanical properties. Compared with traditional alloys, high entropy alloys exhibit properties such as high hardness, excellent corrosion resistance and good high temperature performance, making them potentially valuable for application in aerospace, military, automotive manufacturing and energy fields.

尽管高熵合金具有许多优点，但目前在合金设计方面仍存在一些挑战。传统的合金设计方法往往基于试错法，不仅耗时、成本高，而且难以系统地探索和优化复杂的成分空间。Despite the many advantages of high-entropy alloys, there are still some challenges in alloy design. Traditional alloy design methods are often based on trial and error, which is not only time-consuming and costly, but also difficult to systematically explore and optimize complex composition spaces.

以机器学习为代表的数据驱动方法在合金研发中的作用越发凸显，通过建立输入特征与材料目标间的复杂关系，机器学习方法可实现材料性能的快速预测，因而在指导包括高熵合金在内的材料设计中发挥重要作用。The role of data-driven methods represented by machine learning in alloy research and development is becoming increasingly prominent. By establishing a complex relationship between input features and material targets, machine learning methods can achieve rapid prediction of material properties, and thus play an important role in guiding the design of materials including high entropy alloys.

中国专利公告号为CN116092604B，提出了一种基于数据驱动的高强高韧难熔高熵合金及制备方法，该方法能够从高熵合金数据集中挖掘数据集信息，并预测出可能具有目标性质的成分点，这大大加速了高熵合金成分设计和优化。然而，目前材料科学中的机器学习方法主要使用单个模型对材料性能或化学结构的进行建模和预测，虽然这些方法取得了一定的预测效果，但是预测结果还不够精确，而且上述方法只能简单的进行预测，在设计高熵合金的具体成分方面还比较困难。尽管机器学习算法具有出色的学习和建模预测能力，但这些模型在训练数据集小于300条的情况下，只能保证成分分布范围内的预测精度，不能探索未知的空间。集成学习方法能够将多个机器学习模型进行整合，从而获得比单个机器学习器更好的预测结果。The Chinese patent announcement number is CN116092604B, which proposes a data-driven high-strength, high-toughness refractory high-entropy alloy and preparation method. The method can mine data set information from the high-entropy alloy data set and predict the composition points that may have the target properties, which greatly accelerates the design and optimization of high-entropy alloy composition. However, the current machine learning methods in materials science mainly use a single model to model and predict material properties or chemical structures. Although these methods have achieved certain prediction effects, the prediction results are not accurate enough, and the above methods can only make simple predictions, which is still difficult in designing the specific composition of high-entropy alloys. Although machine learning algorithms have excellent learning and modeling prediction capabilities, these models can only guarantee the prediction accuracy within the composition distribution range when the training data set is less than 300, and cannot explore unknown space. Ensemble learning methods can integrate multiple machine learning models to obtain better prediction results than a single machine learning machine.

综上所述，本发明开发一种基于集成学习的高硬度高熵合金成分设计方法，具有重要的理论意义和应用价值，对于推动材料科学领域的发展和实现新型高硬度高熵合金的快速迭代有着至关重要的作用。In summary, the present invention develops a high-hardness and high-entropy alloy composition design method based on ensemble learning, which has important theoretical significance and application value, and plays a vital role in promoting the development of the field of materials science and realizing the rapid iteration of new high-hardness and high-entropy alloys.

发明内容Summary of the invention

本发明提供一种基于集成学习的高硬度高熵合金成分设计方法，通过差分自编码算法获取了高熵合金硬度分布的隐空间，并使用高斯混合模型结合马尔可夫链蒙特卡罗方法对隐空间进行抽样选出参与候选筛选的样本点；之后，使用集成学习方法，将多个机器学习模型进行组合，并对参与样本点进行预测，最终通过效能函数选择出高硬度的高熵合金成分点，用以解决单个机器学习模型预测精度不高，寻找高硬度高熵合金成分点困难，探索未知高熵合金成分空间困难的问题。The present invention provides a high-hardness high-entropy alloy composition design method based on ensemble learning, which obtains the latent space of the hardness distribution of the high-entropy alloy through a differential autoencoder algorithm, and uses a Gaussian mixture model combined with a Markov chain Monte Carlo method to sample the latent space to select sample points participating in candidate screening; then, an ensemble learning method is used to combine multiple machine learning models, and the participating sample points are predicted, and finally high-hardness high-entropy alloy composition points are selected through an efficiency function, so as to solve the problems of low prediction accuracy of a single machine learning model, difficulty in finding high-hardness high-entropy alloy composition points, and difficulty in exploring unknown high-entropy alloy composition space.

本发明具体的技术方案如下：The specific technical solutions of the present invention are as follows:

一种基于集成学习的高硬度高熵合金成分设计方法，设计方法步骤如下：A high-hardness high-entropy alloy composition design method based on ensemble learning, the design method steps are as follows:

S1：基于高熵合金体系获取用于预测高熵合金硬度的数据集，高熵合金体系为Al-Co-Cr-Cu-Fe-N i体系，基于该体系收集多条成分和其对应的硬度数据，并结合高熵合金的物理特征，形成用于进一步分析的数据集，数据集条数为100-300；S1: Based on the high entropy alloy system, a data set for predicting the hardness of high entropy alloys is obtained. The high entropy alloy system is the Al-Co-Cr-Cu-Fe-Ni system. Based on this system, multiple components and their corresponding hardness data are collected, and combined with the physical characteristics of the high entropy alloy, a data set is formed for further analysis. The number of data sets is 100-300;

用高熵合金20个物理特征来描述合金的基础属性和影响其性能的因素，这些特征分别是原子半径差(δr)，电负性差(Δχ)，价电子浓度(VEC)，混合焓(ΔH)，构型熵(ΔS)，Ω参数(Ω)，Λ参数(Λ)，γ参数(γ)，局部电负性失配(D.χ)，流动电子数目(e/a)，内聚能(Ec)，模量失配(η)，局部尺寸失配(D.r)，能量项(A)，纳巴罗系数(F)，功函数(W)，剪切模量(G)，剪切模量差(δG)，局部模数失配(D.G)，晶格畸变能(μ)。这些特征的计算依托于现有文献(基于机器学习的A l CoCrCuFeN i系高熵合金硬度预测，作者：邹瑞)中提供的元素物理特性值，根据其公式计算得到；并与成分数据整合，形成了用于进一步分析的初始数据集；Twenty physical characteristics of high entropy alloys are used to describe the basic properties of the alloys and the factors affecting their performance. These characteristics are atomic radius difference (δr), electronegativity difference (Δχ), valence electron concentration (VEC), mixing enthalpy (ΔH), configuration entropy (ΔS), Ω parameter (Ω), Λ parameter (Λ), γ parameter (γ), local electronegativity mismatch (D.χ), number of mobile electrons (e/a), cohesive energy (Ec), modulus mismatch (η), local size mismatch (D.r), energy term (A), Navarro coefficient (F), work function (W), shear modulus (G), shear modulus difference (δG), local modulus mismatch (D.G), and lattice distortion energy (μ). The calculation of these features relies on the element physical property values provided in the existing literature (Hardness prediction of Al CoCrCuFeNi series high entropy alloy based on machine learning, author: Zou Rui), and is calculated according to its formula; and integrated with the composition data to form an initial data set for further analysis;

S2：然后对数据集中的成分数据进行自编码，获取高熵合金的硬度分布隐空间；S2: Then, the composition data in the data set is autoencoded to obtain the latent space of hardness distribution of high entropy alloy;

S3：使用高斯混合分布模型对隐空间进行抽样，筛选的样本数量为1000-3000，获得参与预测的样本点；S3: Use the Gaussian mixture distribution model to sample the latent space, and select 1000-3000 samples to obtain sample points involved in prediction;

S4：对多个机器学习选用不同的特征组合进行建模，构建多种模型，并对模型进行训练和评估，筛选出合格的模型然后组成集成模型；S4: Select different feature combinations for multiple machine learning models, build multiple models, train and evaluate the models, screen out qualified models and then form an integrated model;

S5：根据集成模型对于参与样本点进行预测，获取硬度预测结果；S5: predict the participating sample points according to the integrated model to obtain the hardness prediction results;

S6：采用效能函数排序选择实验点进行制备样本，然后测试样本点硬度，其中硬度值需达到800HV以上；若测试的硬度值低于800HV标准，将重复之前的S2至S5，利用更新后的数据集重新进行模型的训练、验证和筛选，以优化合金的成分设计，最终实现高硬度高熵合金成分的设计；反之硬度值高于800HV标准，则进行进一步的性能评估和应用探索。S6: Use the efficiency function to sort and select experimental points to prepare samples, and then test the hardness of the sample points, where the hardness value must reach above 800HV; if the tested hardness value is lower than the 800HV standard, repeat the previous S2 to S5, and use the updated data set to re-train, verify and screen the model to optimize the alloy composition design and ultimately achieve the design of high-hardness high-entropy alloy composition; otherwise, if the hardness value is higher than the 800HV standard, further performance evaluation and application exploration will be carried out.

作为本发明的一种技术方案，S1中，基于A l-Co-Cr-Cu-Fe-N i体系，收集278条成分和其对应的硬度数据，结合高熵合金的物理特征，20个物理特征下大概都有几条数据，合并下来就是278条数据。每一条数据包括三个部分，第一个部分是成分6个，第二个部分是20个特征，第三个部分是硬度，是目标特征。所以一共是278行x27列的配置形成初始数据集；并将初始数据集分为训练集和测试集，其中训练集和测试集按照4：1的比例进行随机分配，训练集共222条，测试集56条。As a technical solution of the present invention, in S1, based on the Al-Co-Cr-Cu-Fe-Ni system, 278 components and their corresponding hardness data are collected, combined with the physical characteristics of high entropy alloys, there are probably several data under 20 physical characteristics, and the combined data is 278 data. Each piece of data includes three parts, the first part is 6 components, the second part is 20 characteristics, and the third part is hardness, which is the target feature. Therefore, a total of 278 rows x 27 columns are configured to form the initial data set; and the initial data set is divided into a training set and a test set, wherein the training set and the test set are randomly allocated in a ratio of 4:1, with a total of 222 training sets and 56 test sets.

作为本发明的一种技术方案，S2中，通过差分自编码学习A l-Co-Cr-Cu-Fe-N i高熵合金数据集的潜在表示，差分自编码器的编码器和解码器部分均是使用神经网络来表示，该编码器的作用是接受输入数据并将其转换为一个较低维度的潜在空间，通过解码器将这些潜在表示重构回原始高维数据；输入数据经过差分自编码器的编码器，将成分数据映射至二维隐空间，之后通过解码器将二维隐空间数据映射回原始成分数据，通过比较损失调整差分自编码器的训练参数。As a technical solution of the present invention, in S2, the potential representation of the Al-Co-Cr-Cu-Fe-Ni high entropy alloy data set is learned by differential autoencoding. The encoder and decoder parts of the differential autoencoder are both represented by neural networks. The function of the encoder is to accept input data and convert it into a lower-dimensional latent space, and reconstruct these latent representations back to the original high-dimensional data through the decoder; the input data passes through the encoder of the differential autoencoder, and the component data is mapped to a two-dimensional latent space, and then the two-dimensional latent space data is mapped back to the original component data through the decoder, and the training parameters of the differential autoencoder are adjusted by comparing the loss.

作为本发明的一种技术方案，S2中，所述的隐空间划分为大于600HV的高硬度区域及其他区域，通过神经网络方法训练了神经网络分类器，将大于600HV的区域设置为高硬度区，将低于600HV的设置为低硬度区。As a technical solution of the present invention, in S2, the latent space is divided into a high hardness area greater than 600 HV and other areas, and a neural network classifier is trained by a neural network method, and the area greater than 600 HV is set as a high hardness area, and the area less than 600 HV is set as a low hardness area.

作为本发明的一种技术方案，所述神经网络分类器的网格训练参数设置如下：初始学习率设置为0.0001，训练批量大小设置为32，训练轮数设置为400。As a technical solution of the present invention, the grid training parameters of the neural network classifier are set as follows: the initial learning rate is set to 0.0001, the training batch size is set to 32, and the number of training rounds is set to 400.

作为本发明的一种技术方案，S3中，从高斯混合模型中随机抽取一个初始样本，并进行了10000次迭代，每次迭代，首先基于当前样本使用多元正态分布方法生成一个建议的下一个样本，并将当前样本和建议样本合并传递给S2训练得到的分类器，分类器输出两个样本的分类概率，若被建议的样本接受概率高于当前样本，就保留建议样本作为下一次迭代的样本，否则舍弃，通过该方法，得到1352个参与后续筛选的样本点。As a technical solution of the present invention, in S3, an initial sample is randomly selected from the Gaussian mixture model and 10,000 iterations are performed. In each iteration, a suggested next sample is first generated based on the current sample using the multivariate normal distribution method, and the current sample and the suggested sample are merged and passed to the classifier trained in S2. The classifier outputs the classification probability of the two samples. If the acceptance probability of the suggested sample is higher than that of the current sample, the suggested sample is retained as the sample for the next iteration, otherwise it is discarded. Through this method, 1,352 sample points participating in subsequent screening are obtained.

作为本发明的一种技术方案，S4中，参与集成的模型包括但不限于SVR、CatBoost、Li ghtGBM、Back Propagat i on Neura l Networks、Random Forest、XGBoost、AdaBoost模型。As a technical solution of the present invention, in S4, the models involved in the integration include but are not limited to SVR, CatBoost, LightGBM, Back Propagation Neural Networks, Random Forest, XGBoost, and AdaBoost models.

作为本发明的一种技术方案，S4中，使用迭代增加特征的方法在不同的训练集分布的情况下训练模型，迭代增加特征从初始子集开始，初始子集中只包含元素的成分作，然后逐步地添加新的特征，每次只添加一个特征，在每次迭代中，通过模型RMSE值的变化，来选出模型最终特征集合；RMSE计算过程如下：As a technical solution of the present invention, in S4, a method of iteratively adding features is used to train the model under different training set distributions. The iterative addition of features starts from an initial subset, which only contains the components of the elements, and then gradually adds new features, only one feature is added each time. In each iteration, the final feature set of the model is selected by the change of the RMSE value of the model; the RMSE calculation process is as follows:

其中y_i代表真实硬度，代表模型的预测硬度，n代表样本数量。Where _yi represents the true hardness, represents the prediction hardness of the model, and n represents the number of samples.

作为本发明的一种技术方案，S5中，采用S4中训练得到的多个机器学习模型对参与筛选的高熵合金成分点进行硬度值的预测，计算集成模型提供的预测硬度值的平均数，以及预测硬度值的标准差。As a technical solution of the present invention, in S5, multiple machine learning models trained in S4 are used to predict the hardness values of the high entropy alloy component points involved in the screening, and the average of the predicted hardness values provided by the integrated model and the standard deviation of the predicted hardness values are calculated.

作为本发明的一种技术方案，S6中，效能函数为上置信界限(UCB)函数，计算公式为：As a technical solution of the present invention, in S6, the performance function is an upper confidence limit (UCB) function, and the calculation formula is:

UCB(x)＝ μ(x)+ κσ(x) (3)UCB(x)＝ μ(x)+ κσ(x) (3)

式中，μ(x)为该成分点的预测均值，σ(x)为该成分点的预测标准差；为了兼顾对模型的利用和对未知点的开发，k设置为0.2。Where μ(x) is the predicted mean of the component point, and σ(x) is the predicted standard deviation of the component point. In order to take into account both the utilization of the model and the development of unknown points, k is set to 0.2.

与现有技术相比，本发明具有如下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明通过采用差分自编码算法获取了高熵合金硬度分布的隐空间，并使用高斯混合模型结合马尔可夫链蒙特卡罗方法对隐空间进行抽样选出参与候选筛选的样本点，之后，使用集成学习方法，将多个机器学习模型进行组合形成集成模型，并对参与样本点进行预测，然后进行合金制备并测试，最终实现高硬度高熵合金成分的设计。1. The present invention obtains the latent space of the hardness distribution of high entropy alloys by adopting a differential autoencoder algorithm, and uses a Gaussian mixture model combined with a Markov chain Monte Carlo method to sample the latent space to select sample points participating in candidate screening. After that, an integrated learning method is used to combine multiple machine learning models to form an integrated model, and the participating sample points are predicted, and then the alloy is prepared and tested, and finally the design of high-hardness high-entropy alloy composition is realized.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明高熵合金成分设计流程图；FIG1 is a flow chart of high entropy alloy composition design of the present invention;

图2是本发明高熵合金成分数据集的二维隐空间图；FIG2 is a two-dimensional latent space diagram of a high entropy alloy composition data set of the present invention;

图3是本发明不同特征之间的Spearman等级相关系数图；FIG3 is a graph of Spearman rank correlation coefficients between different features of the present invention;

图4是本发明多个机器学习模型在不同训练集比例下的RMSE结果图。FIG4 is a graph showing the RMSE results of multiple machine learning models of the present invention under different training set ratios.

具体实施方式Detailed ways

下面结合附图和实施例对本发明的实施方式作进一步详细描述。以下实施例用于说明本发明，但不能用来限制本发明的范围。The following embodiments of the present invention are described in further detail in conjunction with the accompanying drawings and examples. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

实施例一：Embodiment 1:

步骤一：如图1-4所示，本发明中高熵合金体系为A l-Co-Cr-Cu-Fe-N i体系，基于该体系收集278条成分和其对应的硬度数据。Step 1: As shown in Figures 1-4, the high entropy alloy system in the present invention is an Al-Co-Cr-Cu-Fe-Ni system, and 278 components and their corresponding hardness data are collected based on this system.

同时，确定了20个物理特征来描述合金的基础属性和影响其性能的因素，这些分别是原子半径差(δr)，电负性差(Δχ)，价电子浓度(VEC)，混合焓(ΔH)，构型熵(ΔS)，Ω参数(Ω)，Λ参数(Λ)，γ参数(γ)，局部电负性失配(D.χ)，流动电子数目(e/a)，内聚能(Ec)，模量失配(η)，局部尺寸失配(D.r)，能量项(A)，纳巴罗系数(F)，功函数(W)，剪切模量(G)，剪切模量差(δG)，局部模数失配(D.G)，晶格畸变能(μ)。这些特征的计算依托于文献(基于机器学习的A l CoCrCuFeN i系高熵合金硬度预测，作者：邹瑞)中提供的元素物理特性值，根据其公式计算得到；并与成分数据整合，形成了用于进一步分析的初始数据集。At the same time, 20 physical features were identified to describe the basic properties of the alloy and the factors affecting its performance, which are atomic radius difference (δr), electronegativity difference (Δχ), valence electron concentration (VEC), mixing enthalpy (ΔH), configuration entropy (ΔS), Ω parameter (Ω), Λ parameter (Λ), γ parameter (γ), local electronegativity mismatch (D.χ), number of mobile electrons (e/a), cohesive energy (Ec), modulus mismatch (η), local size mismatch (D.r), energy term (A), Nabarro coefficient (F), work function (W), shear modulus (G), shear modulus difference (δG), local modulus mismatch (D.G), lattice distortion energy (μ). The calculation of these features relies on the element physical property values provided in the literature (Hardness prediction of Al CoCrCuFeNi series high entropy alloy based on machine learning, author: Zou Rui), and is calculated according to its formula; and integrated with the composition data to form an initial data set for further analysis.

步骤二：对数据集中的成分数据进行自编码，获取高熵合金的硬度分布隐空间，隐空间划分为大于600HV的高硬度区域及其他区域；Step 2: Autoencode the composition data in the data set to obtain the latent space of hardness distribution of high entropy alloy. The latent space is divided into high hardness areas greater than 600 HV and other areas.

需要说明的是：本文中提出的成分数据集指的是278条数据的成分部分，对应隐空间指的是经过差分自编码器的编码器从高维空间映射到的二维空间It should be noted that the component dataset proposed in this article refers to the component part of 278 data, and the corresponding latent space refers to the two-dimensional space mapped from the high-dimensional space by the encoder of the differential autoencoder.

本发明中使用差分自编码学习Al-Co-Cr-Cu-Fe-Ni高熵合金数据集的潜在表示。该编码器的作用是接受输入数据并将其转换为一个较低维度的潜在空间。在这个过程中，编码器试图保留数据的重要特征，这些特征对于重建输入数据是必要的，都是由神经网络解码器接收潜在空间中的编码并尝试重建输入数据。解码器的目标是生成与原始输入尽可能接近的输出，从而学习到数据的有效表示，然后通过解码器将这些潜在表示重构回原始高维数据，差分自编码器的编码器和解码器部分均是使用神经网络来表示。而后通过优化损失函数可以得到最优的编码器和解码器网络参数，进而利用变分参数采样并重构信号；在具体实施例中，网络的训练参数设置下：初始学习率设置为0.001，权重衰减值置为0.0001，训练批量大小设置为64，训练轮数设置为600。In the present invention, differential autoencoders are used to learn the potential representation of Al-Co-Cr-Cu-Fe-Ni high entropy alloy data sets. The role of the encoder is to accept input data and convert it into a lower dimensional potential space. In this process, the encoder attempts to retain the important features of the data, which are necessary for reconstructing the input data. The neural network decoder receives the encoding in the potential space and attempts to reconstruct the input data. The goal of the decoder is to generate an output as close as possible to the original input, so as to learn the effective representation of the data, and then reconstruct these potential representations back to the original high-dimensional data through the decoder. The encoder and decoder parts of the differential autoencoder are both represented by neural networks. Then, the optimal encoder and decoder network parameters can be obtained by optimizing the loss function, and then the variational parameters are used to sample and reconstruct the signal; in a specific embodiment, the training parameters of the network are set: the initial learning rate is set to 0.001, the weight decay value is set to 0.0001, the training batch size is set to 64, and the number of training rounds is set to 600.

Loss＝Loss_{{重构损失}}+Loss_{DL散度} (1)Loss = Loss _{{reconstruction loss}} + Loss _{{DL divergence}} (1)

输入数据经过差分自编码器的编码器，将成分数据映射至二维隐空间，之后通过解码器将二维隐空间数据映射回原始成分数据，通过比较最终损失调整差分自编码器的训练参数。The input data passes through the encoder of the differential autoencoder to map the component data to a two-dimensional latent space, and then the two-dimensional latent space data is mapped back to the original component data through the decoder. The training parameters of the differential autoencoder are adjusted by comparing the final loss.

最终本具体实施例中的成分数据分布二维隐空间图如图2所示。不同的体系的成分点在二维隐空间中产生了较为明显差异的边界。Finally, the two-dimensional latent space diagram of the component data distribution in this specific embodiment is shown in Figure 2. The component points of different systems produce relatively obvious boundaries in the two-dimensional latent space.

在此之后，本发明通过神经网络方法训练了神经网络分类器，将大于600HV的区域设置为高硬度区，将低于600HV的设置为低硬度区。我们将训练集和测试集按照4：1的比例进行随机分配，训练集共222条，测试集56条。该分类器的网格训练参数设置如下：初始学习率设置为0.0001，训练批量大小设置为32，训练轮数设置为400，通过五折交叉验证。该神经网络分类器在训练集上的准确率为0.89，在测试集上的准确率为0.85。在现有材料学科中，默认0.6以上既有较好的准确率。After that, the present invention trained a neural network classifier through a neural network method, set the area greater than 600HV as a high hardness area, and set the area less than 600HV as a low hardness area. We randomly allocated the training set and the test set in a ratio of 4:1, with a total of 222 items in the training set and 56 items in the test set. The grid training parameters of the classifier are set as follows: the initial learning rate is set to 0.0001, the training batch size is set to 32, the number of training rounds is set to 400, and five-fold cross validation is performed. The accuracy of the neural network classifier on the training set is 0.89, and the accuracy on the test set is 0.85. In the existing materials disciplines, it is assumed that 0.6 or above has a good accuracy.

步骤三：使用高斯混合分布模型对隐空间进行抽样，筛选的样本数量为1000-3000，获得参与预测的样本点；Step 3: Use the Gaussian mixture distribution model to sample the latent space, and select 1000-3000 samples to obtain sample points involved in prediction;

本发明中使用了该体系的成分数据集来训练高斯混合模型；为了确定高斯混合模型中最适合的高斯成分数量，本实施例运用了肘部算法，该算法通过评估不同成分数量对模型拟合优度的影响，寻找一个拐点，即增加更多的高斯成分不再显著提升模型的拟合优度。根据肘部算法的结果，确定了最佳的高斯成分数目为6，平均负对数似然为-1.16。The present invention uses the component data set of the system to train the Gaussian mixture model; in order to determine the most suitable number of Gaussian components in the Gaussian mixture model, this embodiment uses the elbow algorithm, which evaluates the impact of different numbers of components on the goodness of fit of the model and finds an inflection point, that is, adding more Gaussian components no longer significantly improves the goodness of fit of the model. According to the results of the elbow algorithm, the optimal number of Gaussian components is determined to be 6, and the average negative log likelihood is -1.16.

在确定了最佳的高斯成分数量后，本发明进一步使用马尔可夫链蒙特卡洛方法对高斯混合模型(高斯成分数目是6时的高斯混合模型)进行抽样。After determining the optimal number of Gaussian components, the present invention further uses the Markov Chain Monte Carlo method to sample the Gaussian mixture model (the Gaussian mixture model when the number of Gaussian components is 6).

本实施例首先从高斯混合模型中随机抽取了一个初始样本，并进行了10000次迭代。每次迭代，首先基于当前样本使用多元正态分布方法生成一个建议的下一个样本，并将当前样本和建议样本合并传递给步骤二训练得到的分类器，分类器输出两个样本的分类概率，如果被建议的样本接受概率高于当前样本，就保留建议样本作为下一次迭代的样本，否则舍弃。通过该方法，我们得到了1352个参与后续筛选的样本点，这些样本点被认为是可能具有高硬度值的高熵合金成分点。This embodiment first randomly extracts an initial sample from the Gaussian mixture model and performs 10,000 iterations. In each iteration, a suggested next sample is first generated based on the current sample using the multivariate normal distribution method, and the current sample and the suggested sample are merged and passed to the classifier trained in step 2. The classifier outputs the classification probability of the two samples. If the acceptance probability of the suggested sample is higher than that of the current sample, the suggested sample is retained as the sample for the next iteration, otherwise it is discarded. Through this method, we obtained 1,352 sample points for subsequent screening, which are considered to be high-entropy alloy component points that may have high hardness values.

需要说明的是：其中10000次迭代，指一个初始样本，迭代一次生成建议样本，若保留，则建议样本作为下一次的初始样本，再次迭代生成下一次的建议样本，共10000次。若经过10000次迭代后，还保留的则作为样本点，通过测算共计保留了1352个样本点。It should be noted that 10,000 iterations refer to an initial sample. A recommended sample is generated after one iteration. If it is retained, the recommended sample is used as the initial sample for the next time. The next recommended sample is generated after another iteration, for a total of 10,000 times. If it is still retained after 10,000 iterations, it is used as a sample point. Through measurement, a total of 1,352 sample points are retained.

步骤四：对多个机器学习选用不同的特征组合进行建模，构建多种模型，并对模型进行训练和评估，筛选出合格的模型然后组成集成模型；Step 4: Select different feature combinations for multiple machine learning models, build multiple models, train and evaluate the models, screen out qualified models and then form an integrated model;

本发明中使用了SVR、CatBoost、L i ghtGBM、Back Propagat i on NeuralNetworks(反向神经网络)、Random Forest(随机森林)、XGBoost(决策树分类器)、AdaBoost等机器学习方法。The present invention uses machine learning methods such as SVR, CatBoost, LightGBM, Back Propagation Neural Networks, Random Forest, XGBoost (decision tree classifier), AdaBoost, etc.

首先使用斯皮尔曼等级相关系数(Spearman)筛选出相关性大于0.95的特征。斯皮尔曼等级相关系数用来评估两个变量的相关性，即它们的相关程度以及变量值以相同方式变化的方向，其值介于-1到+1之间，绝对值越接近1则说明相关性越大，因为在特征数据中存在高度相关的特征，那么就可以用其中一个代替另一个；不同特征之间的特征之间的Spearman等级相关系数图如图3所示。First, the Spearman rank correlation coefficient (Spearman) is used to screen out features with a correlation greater than 0.95. The Spearman rank correlation coefficient is used to evaluate the correlation between two variables, that is, their degree of correlation and the direction in which the variable values change in the same way. Its value ranges from -1 to +1. The closer the absolute value is to 1, the greater the correlation. Because there are highly correlated features in the feature data, one of them can be used to replace the other; the Spearman rank correlation coefficient diagram between different features is shown in Figure 3.

需要说明的是：上文中的“两个变量”指的是概念中的两个变量，实际参与的是除了成分之外的特征与硬度的比较。It should be noted that the "two variables" mentioned above refer to two variables in the concept, and what is actually involved is the comparison of characteristics other than composition and hardness.

之后使用了迭代增加特征的方法在不同的训练集分布的情况下训练模型，迭代增加特征从初始子集开始，初始子集中只包含元素的成分作，然后逐步地添加新的特征，每次只添加一个特征。在每次迭代中，通过模型RMSE值的变化，来选出模型最终特征集合。对于每一个机器学习方法，我们都训练得到100个机器模型，并选出RMSE值最低的前20个模型，作为集成模型。对于每一种机器学习算法，我们都构建并训练了100个不同的模型实例。这些模型包括SVR、CatBoost、Li ghtGBM、Back Propagat i on Neura l Networks、RandomForest、XGBoost，每一种都在不同特征集合和训练分布的组合中得到应用。训练完成后，我们使用均方根误差(RMSE)作为性能指标来评估这些模型。RMSE是一种常用的度量，它衡量的是模型预测值与实际观测值之间的差异。较低的RMSE值通常意味着模型具有更好的预测精确度。最终的每个模型在不同比例下的RMSE结果图如图4所示。Then, the iterative feature addition method was used to train the model under different training set distributions. The iterative feature addition started from the initial subset, which only contained the components of the elements, and then gradually added new features, one feature at a time. In each iteration, the final feature set of the model was selected by the change of the model RMSE value. For each machine learning method, we trained 100 machine models and selected the top 20 models with the lowest RMSE values as the integrated model. For each machine learning algorithm, we built and trained 100 different model instances. These models include SVR, CatBoost, LightGBM, Back Propagation Neural Networks, RandomForest, and XGBoost, each of which was applied in different combinations of feature sets and training distributions. After training, we used the root mean square error (RMSE) as a performance indicator to evaluate these models. RMSE is a commonly used metric that measures the difference between the model's predicted value and the actual observed value. A lower RMSE value usually means that the model has better prediction accuracy. The final RMSE result graph of each model at different scales is shown in Figure 4.

在完成所有模型的训练和评估之后，我们从每种机器学习方法构建的100个模型中挑选出RMSE值最低的前20个模型。这些模型被视为性能最优的模型，并被选定用于构建最终的集成模型。集成模型通过结合多个模型的预测结果来提高预测的精度和稳健性。通过这种方式，我们希望集成模型能够综合不同单一模型的优势，减少可能的过拟合，并最终在面对未知数据时提供更可靠的预测。After training and evaluating all models, we selected the top 20 models with the lowest RMSE values from the 100 models built by each machine learning method. These models were considered the best performing models and were selected to build the final ensemble model. The ensemble model improves the accuracy and robustness of the prediction by combining the prediction results of multiple models. In this way, we hope that the ensemble model can combine the advantages of different single models, reduce possible overfitting, and ultimately provide more reliable predictions when facing unknown data.

步骤五：对多个机器学习选用不同的特征组合进行建模，构建多种模型，并对模型进行训练和评估，筛选出合格的模型然后组成集成模型；Step 5: Select different feature combinations for multiple machine learning models, build multiple models, train and evaluate the models, select qualified models and then form an integrated model;

根据本发明实施例，本发明中使用步骤四训练得到的多个机器学习模型对参与筛选的高熵合金成分点进行硬度值的预测。为了预测每一个具体的高熵合金成分点的硬度值，我们采用了集成学习的方法。According to an embodiment of the present invention, the present invention uses multiple machine learning models trained in step 4 to predict the hardness values of the high entropy alloy component points involved in the screening. In order to predict the hardness value of each specific high entropy alloy component point, we use an integrated learning method.

集成学习技术的核心优势在于其多模型融合的策略。本发明中将多个不同的机器学习模型进行有机整合，各个模型在训练过程中独立学习并捕获数据的不同特征和内在规律。在预测阶段，通过对各个模型输出结果的综合考量，不仅提取了单个模型中的优势信息，同时也规避了任一模型可能存在的局限性，优化了整体预测性能。The core advantage of ensemble learning technology lies in its multi-model fusion strategy. In the present invention, multiple different machine learning models are organically integrated, and each model independently learns and captures different characteristics and inherent laws of the data during the training process. In the prediction stage, by comprehensively considering the output results of each model, not only the advantageous information in a single model is extracted, but also the possible limitations of any model are avoided, thereby optimizing the overall prediction performance.

具体来说，对每个成分点，我们不仅计算了集成模型提供的预测硬度值的平均数，而且还计算了这些预测值的标准差。预测均值给出了模型对该成分点硬度的集体预测，而预测的标准差则提供了一个关于预测不确定性的量化指标。对于每个机器学习方法，有20个机器学习模型，因此，对于本实施例共计180个机器学习模型参与对成分点硬度的预测。Specifically, for each component point, we not only calculated the mean of the predicted hardness values provided by the ensemble model, but also calculated the standard deviation of these predicted values. The predicted mean gives the model's collective prediction of the hardness of the component point, while the predicted standard deviation provides a quantitative indicator of the uncertainty of the prediction. For each machine learning method, there are 20 machine learning models, so for this example, a total of 180 machine learning models are involved in the prediction of the hardness of the component point.

步骤六：根据效能函数排序选择实验点进行制备，测试样本点硬度；Step 6: Select experimental points for preparation according to the efficiency function ranking and test the hardness of the sample points;

当步骤六得到的硬度未达到预订要求，则将该合金成分特征数据增加至数据集，重复步骤四到五，直至根据步骤六得到的硬度达到预设要求。When the hardness obtained in step six does not meet the preset requirement, the characteristic data of the alloy composition is added to the data set, and steps four to five are repeated until the hardness obtained in step six meets the preset requirement.

根据本发明实施例，本发明中的效能函数为上置信界限(UCB)函数。According to an embodiment of the present invention, the performance function in the present invention is an upper confidence bound (UCB) function.

UCB(x)＝ μ(x)+ κσ(x) (3)UCB(x)＝ μ(x)+ κσ(x) (3)

μ(x)为该成分点的预测均值，σ(x)为该成分点的预测标准差；为了兼顾对模型的利用和对未知点的开发，k设置为0.2；UCB只是代表一个排名，综合了预测平均值和标准差。μ(x) is the predicted mean of the component point, and σ(x) is the predicted standard deviation of the component point. In order to take into account both the utilization of the model and the development of unknown points, k is set to 0.2. UCB only represents a ranking that combines the predicted mean and standard deviation.

最后我们选择出了Al45Co23Cr18Cu1Fe7Ni5、Al44Co15Cr22Cu5Fe8Ni6以及Al44Co16Cr15Fe13Ni 12为最终成分点。Finally, we selected Al45Co23Cr18Cu1Fe7Ni5, Al44Co15Cr22Cu5Fe8Ni6 and Al44Co16Cr15Fe13Ni 12 as the final composition points.

为了确保合金的高质量，我们严格选用了纯度超过99.9％的金属材料作为合金的原料。制备过程首先从打磨原材料表面开始，仔细去除可能影响合金性能的表面氧化层。随后，通过超声波清洗技术深入清洁，以去除任何残留的杂质和污染物，然后将清洗过的材料放入烘箱中彻底烘干，以排除因湿气带来的潜在问题。To ensure the high quality of the alloy, we strictly select metal materials with a purity of more than 99.9% as the raw materials of the alloy. The preparation process starts with grinding the surface of the raw materials to carefully remove the surface oxide layer that may affect the performance of the alloy. Subsequently, it is deeply cleaned by ultrasonic cleaning technology to remove any remaining impurities and contaminants, and then the cleaned materials are placed in an oven for thorough drying to eliminate potential problems caused by moisture.

在准备好原材料之后，我们根据每种合金成分的摩尔比例精确计算出所需的各种金属原材料的质量。然后，将经过超声清洗和精确称重的各种金属原料准备妥当，以便进行下一步的合金熔炼工序。我们采用感应熔炼法进行合金的制备，这种方法能够提供均匀且可控的加热环境，以促进金属的充分融合。为了确保合金成分均匀，我们对每个合金样本进行了至少六次的反复熔炼处理，每次熔炼后都会等待合金冷却并翻转，以确保各组分在合金中均匀分布。After preparing the raw materials, we accurately calculate the mass of each metal raw material required based on the molar ratio of each alloy component. Then, the various metal raw materials are prepared after ultrasonic cleaning and accurate weighing for the next alloy melting process. We use induction melting to prepare the alloy, which can provide a uniform and controlled heating environment to promote the full fusion of the metal. In order to ensure the uniformity of the alloy composition, we repeatedly melt each alloy sample at least six times, and wait for the alloy to cool and turn it over after each melting to ensure that the components are evenly distributed in the alloy.

在合金制备完成后，我们对所得高熵合金样品进行硬度测试。测试的目的是确保合金的硬度值满足我们对高性能材料的严格要求，即硬度值需达到800HV以上。如果所测得的硬度值达到这一标准，那么我们可以认为合金制备成功，可以进行进一步的性能评估和应用探索。After the alloy is prepared, we conduct hardness tests on the obtained high entropy alloy samples. The purpose of the test is to ensure that the hardness value of the alloy meets our strict requirements for high-performance materials, that is, the hardness value must reach 800HV or above. If the measured hardness value meets this standard, then we can consider that the alloy preparation is successful and further performance evaluation and application exploration can be carried out.

若硬度值未达到800HV，则将这三个成分点的实际测得硬度值记录下来，并反馈到我们的机器学习数据集中。这样的数据反馈机制可以帮助我们不断完善和调整算法模型，通过迭代学习进一步提升合金设计的准确性。If the hardness value does not reach 800HV, the actual measured hardness values of the three component points will be recorded and fed back to our machine learning data set. Such a data feedback mechanism can help us continuously improve and adjust the algorithm model and further improve the accuracy of alloy design through iterative learning.

接下来，我们将重复之前的步骤2至步骤5，利用更新后的数据集重新进行模型的训练、验证和筛选，以优化合金的成分设计。通过这种动态迭代的方法，我们逐步靠近理想的合金配比，最终实现高硬度高熵合金成分的设计。Next, we will repeat steps 2 to 5 and use the updated data set to retrain, validate, and screen the model to optimize the alloy composition design. Through this dynamic iterative method, we gradually approach the ideal alloy ratio and ultimately achieve the design of high-hardness high-entropy alloy composition.

本发明的实施例是为了示例和描述起见而给出的，而并不是无遗漏的或者将本发明限于所公开的形式，尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换。The embodiments of the present invention are provided for the purpose of illustration and description, and are not intended to be exhaustive or to limit the present invention to the disclosed forms. Although the present invention has been described in detail with reference to the aforementioned embodiments, it is still possible for those skilled in the art to modify the technical solutions described in the aforementioned embodiments, or to make equivalent substitutions for some of the technical features therein.

Claims

1. A high-hardness high-entropy alloy composition design method based on ensemble learning, characterized in that the design method steps are as follows:

S1: Based on the high entropy alloy system, a data set for predicting the hardness of high entropy alloys is obtained. The high entropy alloy system is the Al-Co-Cr-Cu-Fe-Ni system. Based on this system, multiple components and their corresponding hardness data are collected, and combined with the physical characteristics of the high entropy alloy to form a data set for further analysis. The number of data sets is 100-300;

S2: Then, the composition data in the data set is autoencoded to obtain the latent space of hardness distribution of high entropy alloy;

S3: Use the Gaussian mixture distribution model to sample the latent space, and select 1000-3000 samples to obtain sample points involved in prediction;

S4: Select different feature combinations for multiple machine learning models, build multiple models, train and evaluate the models, screen out qualified models and then form an integrated model;

S5: predict the participating sample points according to the integrated model to obtain the hardness prediction results;

S6: Use the efficiency function to sort and select experimental points to prepare samples, and then test the hardness of the sample points, where the hardness value must reach above 800HV; if the tested hardness value is lower than the 800HV standard, repeat the previous S2 to S5, and use the updated data set to re-train, verify and screen the model to optimize the alloy composition design and ultimately achieve the design of high-hardness high-entropy alloy composition; otherwise, if the hardness value is higher than the 800HV standard, further performance evaluation and application exploration will be carried out.

2. According to a high-hardness high-entropy alloy composition design method based on ensemble learning in claim 1, it is characterized in that: in S1, based on the Al-Co-Cr-Cu-Fe-Ni system, 278 components and their corresponding hardness data are collected, combined with the physical characteristics of the high-entropy alloy to form an initial data set; and the initial data set is divided into a training set and a test set, wherein the training set and the test set are randomly allocated in a ratio of 4:1, with a total of 222 items in the training set and 56 items in the test set.

3. According to a high-hardness high-entropy alloy composition design method based on ensemble learning as described in claim 1, it is characterized in that: in S2, the potential representation of the Al-Co-Cr-Cu-Fe-Ni high-entropy alloy data set is learned by differential autoencoding, and the encoder and decoder parts of the differential autoencoder are both represented by neural networks. The encoder is used to accept input data and convert it into a lower-dimensional latent space, and these latent representations are reconstructed back to the original high-dimensional data through the decoder; the input data passes through the encoder of the differential autoencoder, and the component data is mapped to a two-dimensional latent space, and then the two-dimensional latent space data is mapped back to the original component data through the decoder, and the training parameters of the differential autoencoder are adjusted by comparing the loss.

4. According to the high-hardness and high-entropy alloy composition design method based on ensemble learning in claim 3, it is characterized in that: in S2, the latent space is divided into a high hardness area greater than 600 HV and other areas, and a neural network classifier is trained by a neural network method, and the area greater than 600 HV is set as a high hardness area, and the area less than 600 HV is set as a low hardness area.

5. A high-hardness and high-entropy alloy composition design method based on ensemble learning according to claim 4, characterized in that: the grid training parameters of the neural network classifier are set as follows: the initial learning rate is set to 0.0001, the training batch size is set to 32, and the number of training rounds is set to 400.

6. According to the high-hardness and high-entropy alloy composition design method based on ensemble learning described in claim 5, it is characterized in that: in S3, an initial sample is randomly extracted from the Gaussian mixture model and 10,000 iterations are performed. In each iteration, a suggested next sample is first generated based on the current sample using the multivariate normal distribution method, and the current sample and the suggested sample are merged and passed to the classifier trained in S2. The classifier outputs the classification probability of the two samples. If the acceptance probability of the suggested sample is higher than that of the current sample, the suggested sample is retained as the sample for the next iteration, otherwise it is discarded. Through this method, 1,352 sample points participating in subsequent screening are obtained.

7. A high-hardness and high-entropy alloy composition design method based on ensemble learning according to claim 6, characterized in that: in S4, the models involved in the integration include but are not limited to SVR, CatBoost, LightGBM, Back Propagation Neural Networks, Random Forest, XGBoost, and AdaBoost models.

8. A high-hardness high-entropy alloy composition design method based on ensemble learning according to claim 7, characterized in that: in S4, the model is trained under different training set distributions using an iterative feature addition method, the iterative feature addition starts from an initial subset, the initial subset only contains the composition of the element, and then new features are added step by step, only one feature is added each time, and in each iteration, the final feature set of the model is selected by the change of the model RMSE value; the RMSE calculation process is as follows:

Where _yi represents the true hardness, represents the prediction hardness of the model, and n represents the number of samples.

9. According to the high-hardness high-entropy alloy composition design method based on integrated learning in claim 8, it is characterized in that: in S5, multiple machine learning models trained in S4 are used to predict the hardness values of the high-entropy alloy component points involved in the screening, and the average of the predicted hardness values provided by the integrated model and the standard deviation of the predicted hardness values are calculated.

10. A high-hardness high-entropy alloy composition design method based on ensemble learning according to claim 9, characterized in that: in S6, the performance function is an upper confidence limit (UCB) function, and the calculation formula is:

UCB(x)μ(+Kσ(x)(3)

Where μ(x) is the predicted mean of the component point, and σ(x) is the predicted standard deviation of the component point. In order to take into account both the utilization of the model and the development of unknown points, k is set to 0.2.