|
1.IntroductionAero engine is the core component of aircraft. It has extremely complex system and structure1. It is prone to mechanical wear failure in the face of harsh operating environment and long working time. The lubricating oil system can effectively reduce the friction between engine parts. If the metal content in the lubricating oil component increases significantly, it generally indicates that some parts of the engine have serious mechanical wear2. Therefore, the wear of the engine system is usually mapped through oil analysis. Machine learning, as the most popular research topic in the field of artificial intelligence3, can mine the potential connections within a large amount of data through various algorithms4, which has great advantages in building prediction models. In this paper, an ensemble algorithm combined with oil-liquid analysis is selected to design a fault diagnosis method based on return sampling strategy extreme random tree (RSS-ERT) to predict engine wear status. The data set is divided into training set and test set, and then a fault diagnosis model based on extreme random tree algorithm is established. Compared with random forest algorithm, this algorithm avoids the over-fitting of training model and the optimization of complex parameters, and improves the accuracy. This method not only has stronger randomness of the training model, but also improves the generalization performance. The coefficient of determination (R2) was used to evaluate performance of the model, and fault diagnosis can be realized by inputting the oil sample data of an aero engine into the trained model. The results show the effectiveness and superiority of the proposed algorithm. 2.Decision Tree2.1Basic modelDecision tree is a widely used inductive reasoning algorithm. In classification problems, decision tree algorithm classifies samples based on features to form a ‘tree’ containing a series of if-then rules. Mathematically, this tree can be interpreted as a conditional probability distribution defined in feature space and class space5. In Figure 1, many nodes and directed edges are combined to form a “tree”. Nodes can be roughly divided into two categories: internal nodes (in) and leaf nodes (LN). The IN indicates a feature, and the LN represents a classification mark. If the current node is an internal node, it moves to a child node of the current node according to the value of the feature corresponding to the sample, and the child node corresponds to a value of the feature. In this way, the recursion will finish in the leaf node, and the classification mark represented by the leaf node is returned. That’s a whole classification process. 2.2Generating regression decision treeThe key to generating the decision tree is the selection of the best segmentation features. In this paper, the mean square error (MSE) is selected to generate the regression tree. We set the input variable to X. Y is the mapping of X as the output variable. The training data set is K ={(x1,y1), (x2,y2)… (xm,ym)}. Then a decision tree can be built recursively and the specific steps are as follows:
3.RSS-ERTExtreme random tree is not only a machine learning algorithm based on Bagging, but also a variant of random forest algorithm. It uses the whole data set as samples each time, and uses CART decision tree6 as the basic weak learner model. Each decision tree is independent of each other. In the final model combination process, the final result is the arithmetic mean of all decision tree models for the regression problem7. RSS can make it easier to train more models with more random. It may lead to some samples not being taken. According to statistics, about 37% of the samples are not taken. This part of the samples is called out of bag samples (OOB)8. Using OOB as the test set to verify the model is more convincing, and the results of the decision tree model can be weighted to increase its accuracy. The algorithm flow of RSS-ERT is shown in Figure 2. Overall sample set is D = {(X1, y1), (X2, y2) … (XN, yN)}. Xi is 1 × Q dimension row vector, yα is xα the true output value of the corresponding sample, α = 1,2… N, N is the number of sample groups, and the algorithm steps are as follows:
4.EXPERIMENT AND RESULT ANALYSIS4.1Experimental dataIn this paper, the model is trained based on the element concentration data of a military aero engine oil sample analysis, and the engine wear state is predicted. There are 212 samples in the data set, including the lubricating oil state of the engine under normal state and wear state. The concentration values of Fe, Al, Cu, Cr, Ag, Ti and Mg are attributes, and F as a label is the fault type of the engine. “1” represents normal state, “2” represents inter shaft bearing wear and “3” represents inter shaft bearing wear with cage fracture. Some data are shown in Table 1. Table 1.Element concentration data of a military aero engine oil sample (part).
In this paper, 30% of the whole data set is used as the test set and 70% as the training set. Due to the large difference in the number of samples under each label, the class imbalance problem is caused. Therefore, the hierarchical sampling technology is adopted. The distribution of data samples is shown in Table 2. Table 2.Sample distribution of experimental data.
4.2.Experimental results and analysisIn this paper, the RSS-ERT model is used to estimate the influence (importance) of the concentration of Fe, Al, Cu, Cr, Ag, Ti and Mg on the accuracy of predicting engine health parameters. After normalizing the importance of these seven elements, they are arranged in descending order according to the importance, as shown in Figure 3. The concentration values of the above seven elements in the lubricating oil are 1.0, 0.32, 0.43, 0.38, 0.28, 0.03 and 0.06. It can be seen from Figure 3 that the concentration of Fe in the lubricating oil has a great impact on the engine health. The concentration values of Cr, Ag and Al have little difference, and the degree of impact is not high. The concentration value of Ti is the least important. The evaluation standard of the model is the determination coefficient R29 and the mathematical expression of R2 is as follows. y is the actual value; is predicted value; indicates the average of the predicted values of sample points, and N is the amount of samples. The values of R2 closer to 1 indicate better prediction. This paper selects the typical linear regression model10 and random forest model11 as the reference, and obtains the R2 of the model through the 50% cross validation of the data set. The values are shown in Figure 4. In Figure 4, the R2 of the linear regression model is the lowest, at 0.244, which also reflects that the linear model prediction is not suitable for the engine oil fault diagnosis in this experiment, and the integrated algorithm has more advantages. In the integrated algorithm, the random forest has been significantly improved to 0.724. The RSS-ERT model proposed in this paper is the highest (0.972), which is very close to 1, and the prediction effect is the best. In this model, the impact of the number of decision trees on the prediction effect is shown in Figure 5. In the experiment, the prime number of decision trees is 50, and then increased by 10 every time until 500. It can be seen from Figure 5 that the impact of the increase of the number of decision trees on the prediction effect fluctuates in the initial stage, and gradually stabilizes in the later stage, which the R2 is 0.969 and the lowest value is 0.954, so it reflects the superiority of this model. In order to more intuitively see whether the model predicts the correct results, the confusion matrixes12 of RSS-ERT model and random forest model are obtained. In Figure 6, it indicates that the predicted value of RSS-ERT model is consistent with the measured data, and the prediction results are accurate. The predicted value of the random forest model is less consistent with the measured data than the RSS-ERT model. One of them is not predicted in the state “1”, and the prediction of the state “2” and “3”. According to Figures 4-6, the prediction accuracy of the model generated by this algorithm is higher. 5.CONCLUSIONSFor the faults caused by mechanical wear of aero engine, combining with oil analysis, a fault diagnosis method based on RSS-ERT is proposed in this paper. Compared with the traditional ensemble learning algorithm, the classification model reduces the amount of training, and provides a guarantee for the independence between the base classifiers. This method has stronger randomness of the training model, improves the generalization performance, and has a wide application prospect. It is applied to the prediction of wear fault of a military aero engine, the results show that the RSS-ERT algorithm has better performance and higher prediction accuracy. REFERENCESCui, J. G., Li, Y., Cui, X., Wang, J. L., Jiang, L. Y. and Yu, M. Y.,
“An improved fault diagnosis method for lubricating oil system of DBN aero engine,”
Journal of Shenyang University of Aeronautics and Astronautics, 37
(06), 49
–54
(2020). Google Scholar
Zhang, C. Y.,
“Research on diagnosis of aero engine mechanical wear fault,” Equipment Management and Maintenance,”
(24), 147
–148
(2020). Google Scholar
Wu, J. D.,
“Research on Machine Learning Method for Aero Engine Gas Path Fault Diagnosis and Prediction,”
Nanjing University of Aeronautics and Astronautics, Nanjing, Master’s Thesis,
(2019). Google Scholar
Robert, C.,
“Machine learning: A probabilistic perspective,”
Chance,
(2),
(2014). https://doi.org/10.1080/09332480.2014.914768 Google Scholar
Cheng, Y., Li, Y. X. and Li, F.,
“Soil moisture inversion in lightning river basin based on extreme random tree,“,”
Journal of Remote Sensing, 25
(04), 941
–951
(2021). Google Scholar
Wang, Y., Chen, D. Y. and Tang, Y. X.,
“Stock forecasting based on cart decision tree and boosting method,”
Journal of Harbin University of Technology, 24
(06), 98
–103
(2019). Google Scholar
Breiman, L.,
“Random forests,”
Machine Learning, 45
(1), 5
–32
(2001). https://doi.org/10.1023/A:1010933404324 Google Scholar
Zhang, C. L., Yang, G. J., Li, H. L., Tang, F. Q., Liu, C. and Zhang, L. Y.,
“Remote sensing inversion of winter wheat leaf area index based on random forest algorithm,”
China Agricultural Science, 51
(05), 855
–867
(2018). Google Scholar
Zhao, S. H.,
“Analysis and evaluation of influencing factors on goodness of fit R2,”
Journal of Northeast University of Finance and Economics,
(03), 56
–58
(2003). Google Scholar
Chen, X.,
“Analysis on influencing factors of air quality index based on linear regression model—Taking Dazu District of Chongqing as an example,”
Environmental Impact Assessment, 43
(05), 79
–82
(2021). Google Scholar
Zhu, P. J., Luo, N. X. and Zhao, Q. S.,
“Prediction of maximum water increase of typhoon storm surge based on stochastic forest model,”
Surveying and Mapping Bulletin,
(12), 71-74
–82
(2021). Google Scholar
Gao, Y. and Peng, W.,
“Research on Application of classification prediction based on keras,”
Journal of Shanxi Datong University (Natural Science Edition), 35
(05), 26
–30
(2019). Google Scholar
|