Monitoring the condition of city bus engines by analysing used oil using PCA and K-Means clustering

Duarte, Margarida Oliveira; Margalho, Luís Melo; Gołębiowski, Wojciech; Mendes, Mateus; Farinha, José Manuel Torres; Šarkan, Branislav

doi:10.1038/s41598-026-39045-x

Download PDF

Article
Open access
Published: 17 February 2026

Monitoring the condition of city bus engines by analysing used oil using PCA and K-Means clustering

Margarida Oliveira Duarte^1,2,
Luís Melo Margalho^1,2,
Wojciech Gołębiowski³,
Mateus Mendes^1,2,4,
José Manuel Torres Farinha^1,2 &
…
Branislav Šarkan⁵

Scientific Reports volume 16, Article number: 9392 (2026) Cite this article

936 Accesses
Metrics details

Subjects

Abstract

Lubricating oil plays a critical role in the operation and longevity of internal combustion engines, particularly in diesel-powered urban buses. Monitoring its degradation and contamination offers valuable insights into engine condition, enabling the adoption of Condition-Based Maintenance (CBM) strategies. This study applied multivariate statistical techniques - specifically Principal Component Analysis (PCA) and K-Means clustering - to a dataset of in-service oil samples from a fleet using Lukoil 10W40. The objective was to identify distinct patterns of oil degradation associated with operational conditions and maintenance profiles. Four operational clusters were identified, including: urban-use buses with frequent idling and stop-start cycles; new engines in the break-in phase with high levels of wear metals; mature engines under regular operating conditions; and an outlier bus affected by oil leakage and extreme contamination. The results highlight those conventional indicators like mileage not totally reliable indicators of oil degradation, reinforcing the need for condition-based monitoring using physicochemical and contamination variables.

Degradation of anti-wear additives and tribological properties of engine oils at extended oil change intervals in city buses

Article Open access 26 July 2025

Optimization, characterization, and GC-MS analysis of recycled used engine oil by solvents and adsorbent extraction

Article Open access 02 January 2025

Preventive maintenance in urban public transport: the role of engine oil analysis

Article Open access 28 December 2024

Introduction

In various sectors, such as industry and transportation, internal combustion engines are essential for the operation of heavy vehicles, such as buses and trucks, as well as for urban mobility. The continuous operation of these engines under extreme conditions leads to wear, friction, and overheating of their internal components.

To increase the power and efficiency of an engine, lubricating oils are crucial. The primary purpose of oil is to reduce friction between mechanical metal parts, heat dissipation caused by generated heat, anti-corrosive characteristics, as well as protection from unwanted depositions. In addition to these general purposes, additive properties to boost characteristics such as detergents that help in cleaning interior components or antioxidants that reduce thermal degradation possibilities do exist.

The choice of the proper lubricant is made according to viscosity, additive content, and the special needs of each engine. Different oils with different viscosities and characteristics are produced to meet the working requirements of each engine in order to give effective performance and extend its life. Additionally, monitoring the aging of lubricating oil over time is required to know when to replace it, prevent mechanical failure, and reduce maintenance costs.

Although mileage is traditionally used as the main indicator for determining oil change intervals, several studies have shown that mileage alone does not reliably correlate with oil degradation, especially in urban fleets. Stop-and-go driving, prolonged idling, engine load variability, and environmental conditions can accelerate or delay degradation independently of distance traveled, making mileage-based maintenance strategies potentially inefficient.

Given the complex operating conditions of municipal fleets, oil degradation results from the combined effect of multiple parameters rather than a single variable. Therefore, multivariate analytical tools are required to capture interactions between physicochemical properties, operating profiles, and maintenance histories. Such tools allow for a more accurate characterization of degradation patterns compared to conventional univariate indicators.

The aim of this study was to evaluate the degradation patterns of used engine oils from a municipal bus fleet through the application of multivariate statistical analysis, specifically Principal Component Analysis (PCA) and K-Means clustering, to systematically analyze physicochemical properties of in-service oil samples. This research sought to identify distinct degradation profiles that reflect different operational and maintenance conditions within fleet operations, establish correlations between oil degradation patterns and specific operational parameters, and develop a data-driven approach for optimizing maintenance intervals and oil change strategies based on actual degradation characteristics rather than solely time-based schedules.

This study further provides a novel contribution by applying PCA and clustering techniques directly to real FTIR spectra obtained from in-service engine oil samples. While these methods have been used in controlled laboratory studies, their application to real operational fleet datasets remains limited, offering new insights into degradation behavior under actual working conditions.

Literature review

Oil analysis plays a crucial role in anticipating and preventing engine failures in urban buses, enabling fleet managers to adopt predictive maintenance strategies. Studies such as the one by Gołębiowski et al.¹. examined oil degradation in real-world fleet operations using established methods like Fourier Transform Infrared Spectroscopy (FTIR) and advanced chemical analysis to assess key oil properties, including oxidation, nitration, sulfation, and soot levels. These methods are widely utilized in the industry due to their accuracy in identifying chemical deterioration in lubricants. The findings indicated that oil degradation is significantly impacted by frequent stop-and-go conditions common in urban bus operations.

Similarly, Lenza et al.² analyzed used lubricants from urban bus diesel engines operating under typical urban traffic. By measuring parameters such as viscosity, total base number, soot content, and wear metal concentrations, the study identified accelerated oil degradation and particulate buildup caused by stop-and-go driving cycles. Using statistical and multivariate techniques, they correlated lubricant condition with engine usage and maintenance timing, demonstrating that regular oil analysis can optimize oil change intervals and improve predictive maintenance, leading to enhanced engine reliability and reduced operational costs.

Another conventional technique involves tracking viscosity and the Total Base Number (TBN), as illustrated by Gołębiowski et al.³. Their research, which measured viscosity at 40 °C and 100 °C, demonstrated that relying solely on fixed mileage intervals for oil changes is inefficient. Instead, regular monitoring of oil properties is essential for maintaining system dependability. These physicochemical parameters are vital in assessing oil performance and preventing damage to engine components. Likewise, Raposo et al.⁴. conducted systematic oil condition monitoring in an urban bus fleet, employing traditional techniques to analyze oxidation, soot, wear metals, and nitration. Using mathematical models such as exponential smoothing and statistical tools like the t-Student distribution, they were able to predict the degradation of components containing iron in their structure. Their findings led to an increase in oil change intervals from 20,000 km to 25,000 km, generating cost savings.

Moreover, Schutz⁵ highlighted that routine oil analysis can detect internal wear and contamination, aiding in failure prediction and enhancing scheduled maintenance. Traditional techniques such as analysis of particles, viscosity measurement, acidity assessment, additive content monitoring, and detection of contaminants - especially trace metals and water- remain reliable and widely applicable across various fleet types. Further reinforcing the necessity of periodic oil assessments, Rodrigues et al.⁶ applied multivariate analysis to lubricating oils, demonstrating how fluctuations in silicate content, sulfation, and viscosity influence lubricant efficiency. Golebiowski et al.⁷ explored the use of engine oil as a diagnostic medium for predicting failures in urban buses. By correlating data from a suite of analyses- including kinematic viscosity, FTIR spectroscopy (for oxidation, nitration, TBN, TAN, and contaminants), elemental analysis via HDXRF, and blotter spot tests- with service logs from two city buses, they identified specific oil degradation signatures associated with critical cooling system failures. Their research confirms that a multidimensional analysis of in-service oil provides a more granular and responsive tool for condition monitoring than simple interval-based changes, highlighting the necessity of such integrated diagnostic approaches for managing maintenance in heterogeneous vehicle fleets operating under variable conditions. Their study underscored the importance of regular oil evaluations in averting mechanical failures.

Several studies highlight the unreliable correlation between mileage and engine oil degradation. Wolak & Krasodomski⁸ indicate that oil degradation levels vary significantly even under similar mileage ranges due to different operating conditions, such as city traffic, highway routes, and hybrid systems. This variability suggests that mileage alone is not a reliable indicator of oil degradation. They also stated that the degradation of engine oil is influenced by factors such as engine design and urban usage patterns, which can lead to premature or delayed oil changes, further emphasizing the inadequacy of mileage-based oil change strategies. Another study, by Rappaport et al.⁹ found that predicting oil’s expected life based on mileage is statistically unreliable due to varying driving habits and local operating conditions.

In addition, Karanović et al.¹⁰ presented a case study demonstrating the practical benefits of lubricant oil analysis in supporting maintenance decision-making. Their research focused on the integration of oil analysis data into maintenance planning processes for industrial machinery, showing how timely diagnostics can enhance equipment availability and reduce unplanned downtime. The study emphasized the value of continuous monitoring of oil parameters - such as viscosity, contamination, and wear particles - to enable early detection of mechanical issues. By leveraging oil analysis as a decision-support tool, maintenance teams were able to shift from reactive to proactive strategies, resulting in optimized maintenance intervals, lower operational costs, and improved asset reliability. This case study reinforces the broader applicability of oil analysis across sectors, including transportation fleets, and supports the integration of data-driven techniques into predictive maintenance frameworks.

Urban fleets, particularly those using CNG and diesel engines, require advanced multivariate tools for effective oil condition monitoring. Studies on urban buses powered by CNG and diesel engines show higher degradation rates for CNG engine oils due to higher thermal and mechanical stress. This necessitates the use of multivariate tools to monitor various degradation parameters such as oxidation, nitration, and additive depletion^11,12. Comparative assessments of engine oil behavior in urban transport fleets highlight the need for multivariate analysis to evaluate the performance of different oil formulations and their impact on maintenance costs and engine reliability^11,13.

Macian et al.¹⁴. extended these findings by conducting fleet trials to examine the performance and degradation of Low-Viscosity Oils (LVO) in heavy-duty engines. The study involved 39 buses powered by Diesel and CNG engines, testing four different lubricants, including two LVOs. Results showed that LVOs could improve fuel economy without negatively affecting engine durability. However, the study also highlighted that LVO effectiveness depends on variables such as engine type and operational conditions. These findings reinforce the need for careful assessment before implementing LVOs in different fleets to maximize their benefits.

Beyond traditional oil analysis methods, recent studies have investigated innovative approaches to improve predictive maintenance efficiency. Raposo et al.¹⁵. developed a condition monitoring framework based on oil analysis, utilizing predictive models and time series analysis to extend oil change intervals safely. Advanced statistical examination of specific oil pollutants, such as soot, allowed for more precise predictions of oil degradation, improving vehicle availability while reducing maintenance costs. Another novel approach involves using Artificial Neural Networks (ANN) and Principal Component Analysis (PCA), as explored in⁶. These artificial intelligence techniques analyzed extensive maintenance data, considering variables like mileage, soot content, and metal levels, to determine optimal oil change timings. Neural networks demonstrated high accuracy, surpassing human expert predictions, while PCA identified the most critical variables influencing oil degradation.

The application of Principal Component Analysis (PCA) and clustering to FTIR data is emerging as a novel approach in engine oil performance analysis. Nagy et al.¹⁶ propose a methodology using FTIR spectroscopy combined with multivariate data analysis, including PCA, to rapidly analyze large vehicle fleets or sample sizes. This approach is cost-efficient and intuitive, allowing for the identification of differences in oil condition and underlying degradation mechanisms. Another study¹⁷ developed machine learning models, including clustering techniques, to predict engine oil degradation. These models effectively capture complex relationships in the data, enhancing predictive maintenance and reducing costs.

Sousa¹⁸ introduced a model correlating mechanical failures with vehicle emissions, incorporating a predictive system based on exhaust opacity and lubricant properties. This approach suggests that continuously monitoring emissions and oil condition can enhance diagnostic precision and further lower maintenance expenses. Additionally, the impact of alternative fuels, such as biodiesel, was examined in¹⁹ using atomic absorption spectroscopy and analytical ferrography to assess oil degradation under varying operational conditions. The study revealed that biodiesel contamination significantly alters viscosity, emphasizing the importance of continuous monitoring and predictive maintenance strategies.

A recent study²⁰ focused on extending lubricant lifespan in internal combustion engines to enhance performance and durability. Over time, lubricants degrade, and the formation of a protective tribofilm layer depends on precursor elements in the oil. Many current models fail to fully capture the evolution of these precursors and their connection to tribofilm development. To address this issue, researchers proposed a mass balance approach for these precursors, incorporating mathematical models and optimal control strategies to maximize tribofilm formation. Their results showed that by adjusting variables like temperature and pressure, lubricant lifespan can be prolonged, engine wear reduced, and overall system efficiency improved.

Camba et al.²¹ proposed an integrated approach to interpret used oil analysis in diesel engines of a truck fleet, combining critical limits, correlation analysis, PCA, and change point detection. First, critical limits are established for key oil parameters to help identify abnormal conditions and potential failures. Next, correlation analysis is used to examine the relationships among different oil properties, providing insights into how these variables interact. Principal Component Analysis (PCA) is then employed to reduce the dimensionality of the dataset, highlighting the most significant variables and patterns that affect oil condition. Finally, change point detection is applied to recognize points in time when the statistical behavior of monitored parameters shift, which may indicate changes in engine operation or oil performance. Together, these methods offer a robust and comprehensive framework for evaluating lubricating oil condition. The application to real data from a fleet revealed four main lubricant degradation mechanisms and identified optimal maintenance timings based on operating hours, thus optimizing predictive maintenance.

In addition to these established approaches, recent research has explored the use of advanced multivariate tools to improve the interpretation of lubricant degradation in diverse fleet environments. Nguyen et al.²². highlighted that principal component analysis (PCA) is one of the most widely used multivariate techniques for handling full Fourier transform infrared (FTIR) spectral regions in lubricant studies, alongside methods such as partial least squares (PLS), interval partial least squares (iPLS) and principal component regression (PCR), because it condenses strongly correlated spectral variables into a smaller set of orthogonal components that capture the dominant sources of chemical variation in the oil matrix. In their review of advanced chemometric tools for oil diagnostics, PCA is presented as a key exploratory method that supports the identification of latent degradation patterns, additive depletion and contamination processes, improving interpretability before building predictive models linking FTIR features to kinematic viscosity, total acid number (TAN), total base number (TBN) or remaining useful life (RUL) indicators.

A complementary approach was proposed by Wolak et al.²³ who combined FTIR spectroscopy with PCA to rapidly classify oil samples from large fleets. Their results highlighted that PCA can effectively distinguish degradation trends and detect anomalies even when traditional parameters like viscosity or TBN remain within acceptable limits. This demonstrates the potential of multivariate FTIR-based models to enhance early detection of abnormal degradation mechanisms and to support more precise, condition-based maintenance planning.

These modern approaches complement conventional techniques by offering deeper insights into oil degradation prediction, enhancing engine performance, and optimizing maintenance schedules. Integrating traditional and advanced methodologies enables a more precise assessment of oil conditions, leading to cost savings and improved fleet reliability.

Despite extensive research on engine oil diagnostics in urban bus fleets, most studies indicate that correlations between mileage and oil degradation are inconsistent and unreliable due to complex operational patterns and variable maintenance practices. This limitation highlights the need for robust multivariate statistical tools capable of extracting meaningful insights from real fleet data. Addressing this gap, the present work implements Principal Component Analysis and K-means clustering methods on actual FTIR spectra of used engine oils to systematically reveal degradation profiles and operational patterns in urban fleet engines.

Methodology and experimental setup

Context and data source

The dataset presented in Table 1 includes various samples of different types of lubricating oils used in urban passenger buses. Among them are oils from brands such as Lukoil, Urania and Orlen, with different viscosity specifications, including 15W40, 10W40, and 5W30. These viscosity grades are particularly formulated to perform in cold weather conditions, ensuring optimal viscosity at low temperatures. These lubricants are widely used in public transport fleets, providing engine protection, operational efficiency, and, in some cases, additional benefits such as fuel economy and emission reduction. Additionally, different oils are used in various bus models, including Autosan, Iveco and others.

Table 1 – Summary of the dataset used, showing number of engine oil samples by lubricant oil Type.

Full size table

For each oil sample, the dataset includes specific details, such as the operator, oil type and brand, vehicle identification number, model, year of production, engine type and specifications, mileage on oil, overall vehicle mileage, sample taken date, and measurement time and date.

In addition, the dataset contains information on various essential properties that help assess oil performance and condition. These properties include oil characteristics such as viscosity at 40 °C and 100 °C, TBN (Total Base Number), TAN (Total Acid Number), oxidation, nitration, sulfation, phosphorus antiwear content, remaining antiwear additive percentage, and remaining amine antioxidant percentage.

Additionally, the dataset records pollutants such as Diesel contamination, soot levels, water content, and the presence of ethylene glycol, which are crucial indicators of oil degradation and potential engine issues. All of these key parameters was carried out using an ERASPEC OIL FTIR mid-infrared spectrometer (Eralytics GmbH, Austria) by ASTM E2412-10 and D7412 standards. The FTIR spectra of used oils were compared with the spectra of fresh oils to identify changes. FTIR spectroscopic comparison between fresh and degraded oils enabled the identification of compositional changes, which were quantified by analyzing band intensity variations through both peak height and area measurements.

For a more detailed evaluation of oil condition and potential wear, the dataset also includes information on contaminants such as coolant leaks and the presence of metals like manganese (Mn), chromium (Cr), copper (Cu), iron (Fe), nickel (Ni), lead (Pb), tin (Sn), and titanium (Ti), which may indicate internal component wear.

Finally, the dataset contains data on additive elements, such as calcium (Ca), molybdenum (Mo), phosphorus (P), sulfur (S), and zinc (Zn), which play a crucial role in maintaining oil performance and protecting the engine. Elemental concentrations were accurately determined and quantified using a HD Maxine multi-element analyzer (XOS, USA), employing high-definition X-ray fluorescence (HDXRF) technology for trace element detection.

The data provides a comprehensive overview of the performance of lubricating oils used in urban buses, highlighting aspects such as wear, degradation, and contamination. In this study, wear refers to the mechanical loss of metal material from engine surfaces; degradation denotes the chemical deterioration of the lubricant due to oxidation, nitration, or additive depletion; and contamination corresponds to the intrusion of substances such as fuel, water, or coolant, which alter the oil’s composition and accelerate its degradation. The analysis reveals significant differences between lubricants in terms of oxidation, nitration, and sulfation, indicating variations in operating conditions and oil change intervals. Additionally, TAN and TBN values help assess the oil’s ability to neutralize acids and maintain its effectiveness over time.

Another important aspect is the presence of contaminants such as soot, Diesel, water, and ethylene glycol, which may indicate engine issues like inefficient combustion or coolant leaks. The analysis of metals, including iron, copper, and lead, also provides insights into internal engine wear, allowing for the identification of potential failures in specific components.

Finally, the depletion of essential additives, such as calcium, molybdenum, phosphorus, and zinc, suggests that the oil’s protective capacity may be decreasing, particularly in high-mileage lubricants.

After a general analysis of the data, it becomes essential to investigate the correlations between parameters related to wear, degradation, and contamination that influence the performance of engine oil over time. The correlation matrix helps to identify relationships between oil mileage and wear metals, as well as the impact of factors such as soot, oxidation, and contamination by water and fuel. This comparative analysis of different oil types provides insights into which properties have the greatest impact on lubricant longevity and efficiency, while also offering valuable indications of potential mechanical failures or excessive wear in specific engine components.

The dataset includes multiple oil samples collected from the same buses at different oil change intervals. However, each sample was treated as an independent observation, since it reflects a unique lubricant condition at a specific mileage-on-oil value, rather than cumulative engine mileage. Therefore, no longitudinal or temporal dependency was assumed between repeated samples.

Prior to applying multivariate analysis techniques, several preprocessing steps were conducted. Records with fully missing or non-relevant values were excluded. Missing values representing less than 5% of valid entries were imputed using the mean of the corresponding variable, as alternative k-NN imputation (k = 5) was tested but did not improve model stability.

All numerical variables were standardized using z-score normalization, ensuring comparability and preventing dominance of variables with larger scales. Logarithmic transformation was considered for highly skewed variables; however, the final dataset did not require transformation according to skewness thresholds. PCA was performed on the correlation matrix, as the variables have different measurement units.

The analytical variables follow standardized measurement units: elemental concentrations are reported in mg/kg (ppm), viscosity is given in mm²/s (cSt) at 40 °C and 100 °C, TBN and TAN in mg KOH/g, contaminants such as Diesel and water in % v/v, and soot in % m/m. Detection limits of trace metals ranged from 0.7 to 90 mg/kg depending on the element, based on HDXRF instrument specifications.

Data processing, PCA, and clustering analysis were performed using Python (Version 3.10) in the Spyder environment, using the libraries pandas, NumPy, scikit-learn, matplotlib, and seaborn.

Correlation analysis

The correlation matrix allows for the identification of relationship patterns between variables, making it possible to understand how different factors interact in a given context. In this specific case, it is essential to analyze the relationship between the mileage traveled by buses and the concentration of wear metals in engine oil.

To achieve this, all available data from the dataset was consolidated and used to generate a Pearson correlation matrix using Python. The correlation coefficients were calculated across standardized (z-score) variables to ensure comparability among parameters measured on different scales. The matrix represents a general correlation structure that includes all oil types combined, providing an overall view of the relationships between physicochemical, additive, and wear-related variables in the fleet. This global approach was adopted to identify common interaction patterns before performing specific analyses by oil type in subsequent sections.

To achieve this, all available samples were consolidated into a Python Pandas dataframe, and the correlation matrix was generated, as shown in Fig. 1.

Although it is commonly anticipated that concentrations of wear metals in engine oil will increase with mileage on oil, our analysis revealed a negative correlation between these variables.

Specifically, oil samples from buses that traveled longer distances on the same oil fill exhibited lower levels of wear metals. This contrary result suggests that operational or maintenance practices - such as permitting longer oil change intervals for vehicles in better mechanical condition - may influence observed metal concentrations. It is important to note that these findings pertain exclusively to mileage on oil and do not reflect the overall mileage or intrinsic condition of the engine, avoiding the misconception that engine wear decreases as mileage increases.

To further explore this analysis, Table 2 was generated to present the correlations separately for each type of oil. The results obtained for the different oil types were consistent with the overall trends shown in Fig. 1, which represents the aggregated data for all oil samples. The results are consistent between Fig. 1; Table 2, and are in line with other findings reported in the literature, such as^6,7. However, the conclusions are final only for the present dataset, other datasets may exhibit different results.

Table 2 Correlation coefficients between Mileage, wear Metals, and oil properties for different oil Types.

Full size table

Methods

In this study, statistical research methods - namely Principal Component Analysis and K-Means clustering - were employed to identify latent degradation patterns in the fleet. PCA was used to reduce dimensionality and reveal the underlying structure of chemical, wear and additive-related variables, while K-Means enabled the segmentation of engines according to their operational and degradation profiles. The objective of applying these methods to the fleet was to determine how different operational conditions influence lubricant degradation and to identify groups of engines with similar behavior, supporting more accurate and data-driven maintenance decisions.

Analysis of data and results

A more detailed analysis will focus on a single oil type: Lukoil 10W40, as it has the largest number of samples available. This selection ensures a more consistent dataset, minimizing variability caused by differences between oil formulations.

This part of the dataset, which will be used in the analysis, includes 25 samples from Cummins ISB6 engines, corresponding to buses manufactured between 2018 and 2020, and 20 samples from FPT INDUSTRIAL S.P.A. engines, from buses produced between 2016 and 2017. Additionally, it includes 12 samples from FPT INDUSTRIAL Tector engines, associated with buses manufactured between 2022 and 2023. Lastly, there is a single sample from an IVECO SPA ITA engine, representing a bus manufactured in 2009.

Data understanding: analysis of physicochemical degradation indicators

With the continuous use of a lubricating oil, its physicochemical properties change due to thermal degradation, oxidation, and contamination. This section evaluates the degradation of Lukoil 10W40 over time in bus engines, based on parameters, such as:

Viscosity: Reflects the oil’s lubrication capability;
TBN (Total Base Number): Indicates the alkaline reserve to neutralize acids;
TAN (Total Acid Number): Measures the increase in acidity due to oxidation;
Oxidation: Assesses the degradation caused by exposure to oxygen and heat;
Nitration: Associated with the reaction of the oil with nitrogen oxides from combustion gases;
Sulfation: Indicates contamination by sulfur compounds.

The analysis of these variables will provide insights into the oil’s degradation and help determine the need for replacement, contributing to predictive engine maintenance and optimizing oil change intervals.

Figure 2 shows the relationship between oil mileage (x-axis) and its viscosity at 40 °C and at 100 °C (y-axis). In the first graph, the red line represents the trend of the data, while the pink-shaded area indicates the confidence interval of the regression. The correlation coefficient (r = 0.36) suggests a moderate positive association, indicating that viscosity tends to increase as the oil accumulates more mileage. Additionally, the p-value of 0.005 confirms that this correlation is statistically significant. The increase in viscosity may be linked to oil oxidation, soot contamination, and the accumulation of metal particles, as well as the degradation of additives over time. These changes can compromise the lubricant’s performance, reducing its efficiency and potentially accelerating engine wear. Furthermore, there is considerable variability among the data points, suggesting that factors other than mileage may also influence the oil’s viscosity.

Figure 2 shows the relationship between oil mileage (x-axis) and viscosity at 40 °C and 100 °C (y-axis). In the first graph, the red line represents the fitted regression trend, while the pink-shaded area indicates the 95% confidence interval. The correlation coefficient (r = 0.36) reveals a moderate positive association, meaning that viscosity tends to increase as the oil accumulates more mileage. The corresponding p-value (0.005) indicates that the probability of observing a correlation of this magnitude by chance - assuming no true relationship exists - is very low, confirming that the trend is statistically significant. The observed increase in viscosity can be associated with oxidation processes, soot accumulation, and the presence of wear metals, as well as the depletion of viscosity-modifying additives. Despite this trend, the substantial dispersion of data points suggests that additional operational or engine-related factors also influence viscosity beyond mileage alone.

Similarly, the second graph in Fig. 2 illustrates the relationship between oil mileage and viscosity at 100 °C. The correlation coefficient (r = 0.27) suggests a weaker positive association compared to the viscosity at 40 °C, with a p-value of 0.038 indicating statistical significance. This increase in viscosity at high temperatures may also be attributed to oil thickening due to oxidation and the depletion of viscosity modifiers. Although the trend remains consistent, the lower correlation implies that additional factors, such as operating conditions and oil formulation, might play a more significant role in determining viscosity at 100 °C.

The remaining figures in this section follow the same visual structure, composed of two complementary panels. The left panel displays the scatterplot with the fitted regression line and its confidence interval, illustrating the trend between oil mileage and the analysed parameter. The right panel shows a boxplot grouping the same variable into mileage intervals, providing a visual summary of its distribution, variability, and potential outliers. This dual representation allows both the overall trend and the dispersion of the data to be assessed simultaneously.

Figure 3 illustrates the relationship between oil mileage and TBN. The X-axis represents oil mileage (km), while the Y-axis indicates TBN, which reflects the oil’s alkaline reserve to neutralize acids.

The correlation coefficient r = -0.32 suggests a weak to moderate negative correlation between oil mileage and TBN, meaning that as mileage increases, TBN tends to decrease. The p-value = 0.015 confirms that this correlation is statistically significant (p < 0.05). The trend line shows a progressive decline in TBN with increasing mileage, indicating oil degradation over time.

This reduction in TBN implies that the oil’s ability to neutralize acids diminishes, making it less effective in protecting the engine against corrosion. This suggests the need for oil replacement before TBN reaches a critical level. The data dispersion indicates variability in TBN values for the same mileage, likely influenced by factors such as operating conditions, fuel quality, or engine type.

Figure 4 illustrates the relationship between lubricant oil mileage and TAN. The X-axis represents oil mileage (Km), while the Y-axis indicates TAN, which measures the acidity level of the oil and reflects its degradation due to oxidation and contamination.

The trend red line shows a progressive increase in TAN with mileage, indicating that the oil undergoes oxidation and becomes more acidic over time.

This rise in acidity suggests that the oil’s ability to protect engine components from corrosion decreases with use. If TAN reaches critical levels, it may lead to increased wear and potential damage to the engine. Therefore, monitoring TAN is essential to determine the optimal oil change interval and ensure engine longevity.

Sulfation is related to oil contamination by sulfur compounds originating from fuel and combustion. The scatter plot in Fig. 5 indicates a moderate correlation between mileage and sulfation levels (r = 0.36, p = 0.006), suggesting that sulfation tends to increase with oil usage over time. The boxplot confirms this trend, showing that sulfation levels are higher in oil samples with greater mileage. This phenomenon can contribute to an increase in lubricant acidity and the corrosion of internal engine components, making it essential to monitor this parameter to prevent premature failures.

Nitration occurs due to the reaction of oil with nitrogen oxides from combustion, potentially leading to the formation of acidic compounds and harmful deposits in the engine. The scatter plot in Fig. 6 shows a moderate positive correlation between mileage and nitration levels (r = 0.35, p = 0.007), indicating a significant increase in this contaminant over time. The boxplot reinforces this trend, showing that samples with higher mileage exhibit elevated nitration levels. This behavior suggests that as the oil is used, there is greater exposure to combustion gases, contributing to lubricant degradation.

Oxidation is a natural process that occurs due to the oil’s exposure to oxygen and the heat generated during engine operation. In Fig. 7, the scatter plot suggests a weak positive correlation between mileage and oxidation levels (r = 0.22, p = 0.101), indicating that oxidation tends to increase as the oil is used, although the relationship is not statistically significant. The boxplot shows a trend of increasing oxidation at higher mileage intervals, suggesting that, in some cases, the oil may degrade more rapidly depending on operating conditions. This increase can reduce the oil’s viscosity and lubricating capacity, making its replacement necessary before it reaches critical levels.

This initial analysis showed that as mileage on oil increases, there is a tendency for oil degradation, evidenced by the increase in TAN, oxidation, sulfation, and nitration, as well as the decrease in TBN. These changes indicate a loss of the lubricant’s protective capacity, reinforcing the need for regular monitoring to prevent excessive engine wear.

Data understanding: evolution of wear metal indicators with oil mileage

The monitoring of wear metals in engine oil is essential for the Condition-Based Maintenance (CBM) of passenger buses, providing insights into component degradation over time. Typically, as the oil accumulates mileage, metal concentrations also increase due to mechanical wear, contamination, and additive depletion.

This section analyzes the iron (Fe) content in relation to oil mileage, with particular attention given to identifying patterns, correlations, and anomalies across different engine types or under abnormal operating conditions.

In the analysis, patterns and correlation trends between oil mileage and iron (Fe) content were evaluated for each engine type. As visualized in Fig. 8, the data points are differentiated according to engine type, allowing observation of both aggregated and engine-specific behaviors. For most engine types, the scatter distribution and the corresponding density plots suggest that Fe content does not consistently increase with mileage on oil. Specifically, some engines exhibit relatively stable Fe levels regardless of oil mileage, while others demonstrate greater variability. For example, the Cummins ISB6.7E6 206B EU6 engines (red points) show a broad dispersion of Fe content with little apparent trend, whereas the IVECO and FPT engines present denser distributions and distinct Fe profiles. There were no strong or uniform positive correlations observed between mileage on oil and Fe content for any engine type. These findings highlight the importance of individual engine characteristics in the wear process and suggest that aggregated analyses may obscure meaningful differences.

Figure 8 illustrates the relationship between oil mileage and iron (Fe) content. Contrary to theoretical expectations of a continuous increase, the observed data show a weak negative correlation.

This discrepancy between intuitively expected results can be attributed to several reasons, namely the following:

Oil quality has a robust performance, almost independent from the engine characteristics.
The real-world operational conditions, which introduce challenges in data analysis and lead to the presence of outliers:
One key factor is engine oil replenishment near the time of sample collection, which dilutes the oil and artificially lowers critical parameters such as viscosity, Total Base Number (TBN), and metal concentrations, thereby distorting expected wear trends.
Furthermore, buses operate under varying mileage conditions, resulting in different wear rates and stages of component degradation. This variability complicates trend analysis, as wear progression does not follow a uniform pattern but rather a stochastic behavior influenced by multiple factors.

The subsequent sections present a comparative conclusion that highlights these findings and discusses their implications for maintenance strategies and the detection of abnormal operating conditions.

Modelling: feature extraction using principal component analysis

To further explore the dataset and identify underlying patterns in wear metal accumulation, Principal Component Analysis was applied. This technique reduces the dimensionality of the data while preserving the most relevant information, allowing for a clearer visualization of how different oil samples group based on their wear metal concentrations. PCA highlights key factors influencing metal accumulation, which may be related to engine type, operating conditions, or oil change intervals.

Following the PCA transformation, K-Means clustering was employed to classify the oil samples into distinct groups according to their wear characteristics. This approach helps uncover potential trends that are not immediately evident in traditional correlation analyses. By segmenting the data into clusters, it becomes possible to determine whether specific engine types or operational conditions lead to higher wear rates, abnormal metal accumulation, or unexpected degradation trends.

The K-Means clustering analysis was performed using the first five principal components from the PCA, which together explain 71.15% of the total variance in the dataset. This means that approximately 71% of the original information can be represented using only these five components, significantly reducing the dataset’s complexity and reducing noise, while retaining most of its informative value.

Table 3 below summarizes the percentage of variance explained by each of the five principal components, along with the total variance they account for.

Table 3 Distribution of explained variance across principal components.

Full size table

The number of clusters (k = 4) was determined based on two validation methods: (1) the Elbow Method applied to the Sum of Squared Errors (SSE), which revealed a clear inflection point between k = 3 and k = 5; and (2) the Silhouette Coefficient, which reached an optimal value at k = 4, balancing cluster cohesion and separation.

Although the average silhouette score (0.49) is considered moderate, it is acceptable for real-world engineering datasets, where overlapping degradation processes, additive depletion, and contamination behaviors naturally produce partially mixed boundaries. These conditions prevent perfectly separated clusters but still allow meaningful physical and chemical interpretation.

Figure 9 presents the silhouette coefficients for each cluster, illustrating the degree of cohesion within clusters and separation between them. Each horizontal bar corresponds to the silhouette value of an individual sample, indicating how similar that sample is to its own cluster compared to the nearest neighboring cluster. Higher silhouette values represent better-defined and more internally consistent clusters. The red dashed line marks the average silhouette score across all samples. Cluster 3 is not represented in Fig. 9 because it is actually a single sample, which indicates that it doesn’t have intern variability.

Table 4 complements this analysis by providing the average silhouette score per cluster. While Cluster 2 exhibits a strong internal cohesion (0.6512), and Cluster 0 shows a moderate structure (0.4961), Clusters 1 and 3 have notably lower scores (0.2297 and 0.0000 respectively). This indicates possible overlap or weak separation in those groups, suggesting that some clusters may represent transitional states or contain more heterogeneous samples.

Table 4 Average silhouette score per cluster.

Full size table

Figure 10 displays the results of the K-Means clustering applied to the PCA-transformed data. The X-axis represents the first principal component (PCA Component 1), while the Y-axis represents the second principal component (PCA Component 2). Each point in the graph corresponds to a sample from the dataset, colored according to the cluster to which it belongs. The color scale on the right indicates the different cluster labels, ranging from 0 to 3. The separation between groups suggests that the K-Means method successfully identified distinct patterns in the PCA-reduced data, confirming the presence of different behaviors in the data distribution.

Table 5 Characterization of clusters.

Full size table

Table 5 presents the summary statistics for each cluster identified through K-Means clustering. The table includes the number of records per cluster, the average mileage on oil, the total mileage of the vehicles, the average year of production of the buses, and the corresponding engine type. While no single variable fully explains the segmentation, some patterns emerge that suggest underlying influencing factors. For instance, Cluster 1 is composed exclusively of newer vehicles (2022–2023) with low overall mileage and FPT Tector 7 engines, while Cluster 0 includes only Cummins ISB6 engines from moderately aged vehicles (2018–2020). Cluster 2 appears more heterogeneous, combining different FPT engines from vehicles produced over a broader time span (2016–2022), with higher average mileage. Cluster 3, by contrast, consists of a single outlier: an older IVECO vehicle with exceptionally high accumulated mileage.

Modeling: cluster formation and interpretation via K-Means

To gain a deeper understanding of the factors driving cluster formation, a detailed analysis of PCA loadings was conducted. Loadings quantify the contribution of each original variable to the principal components, allowing for precise identification of which variables most influence the variance structure in the data and, by extension, the cluster separation.

Figure 11; Table 6 present the absolute values of loadings for the first three principal components. These values indicate how strongly each variable contributes to the respective component. The correspondence between the bar colors and the represented variables is indicated in the legend included in Fig. 11. This color-coding intends to facilitate the visual identification of the most relevant variables for each principal component, enabling a more immediate interpretation of contribution patterns across the PCs.

Table 6 Highest contributing variables to principal components PC1–PC3, based on loading Values.

Full size table

The analysis of loadings reveals distinct patterns in how variables influence each principal component, offering insights into the underlying structure of the data and the drivers of cluster separation. In PC1, variables such as Zn, P, and S, which are related to additive depletion and wear, emerge as the most influential, suggesting their relevance in distinguishing samples with different lubrication conditions. Soot [%] and Phosphorus Anti-wear, both associated with combustion by-products and additive wear, also play major roles. For PC2, the prominence of Mn, Cu, and Ni, all of which are metallic elements typically linked to engine wear, along with TAN (Total Acid Number), points to a component strongly associated with wear and oil degradation over time. Finally, PC3 highlights Cr as a dominant contributor, alongside Pb, TAN, Nitration, and Ca, reinforcing the interpretation of this component as capturing aspects of contaminant accumulation and chemical degradation. These findings suggest that metal content, additive levels, and degradation markers are central to the differentiation of clusters.

Figure 12 illustrates the relative importance of each variable in explaining the variance within the dataset, based on the weighted sum of their loadings on the principal components. The higher the value, the greater the variable’s contribution to the formation of the latent structures captured by PCA – and, consequently, the more relevant it is to the data segmentation.

The importance attributed to the variables suggests that the chemical and functional condition of the oil (in terms of degradation and additive content) has a greater impact on the variation among the samples than factors such as mileage or external contamination. This reinforces the idea that the clusters primarily reflect different stages of oil life and degradation, rather than equipment operating conditions.

To gain deeper insight into the internal structure and composition of each cluster identified through K-Means, a Principal Component Analysis was also applied within each cluster individually. This approach reveals which variables most strongly influence the variance structure within each group.

Table 7 summarizes the percentage of variance explained by the first three principal components (PC1, PC2, and PC3) for each cluster.

Table 7 Distribution of explained variance by Cluster.

Full size table

The analysis of PCA loadings helps to identify which variables most influence the variability within each cluster. By examining the top five loadings for the first two principal components (PC1 and PC2), as shown in Fig. 13, it is possible to determine the key factors that define each cluster. The presence of both positive and negative loadings indicates whether a variable contributes positively or negatively to the component, offering insight into how different variables shape the clustering structure.

Figure 13 highlights the top variables contributing to PC1 and PC2 for each cluster, revealing distinct degradation patterns. The colors in the Figure represent different chemical or physical variables in the oil, as indicated in the legend on the right of the figure.

In general, the loadings reveal distinct degradation patterns across clusters. Cluster 0 is characterized by high contributions from additive-related variables (such as phosphorus, zinc, and soot), along with notable viscosity changes - suggesting a degradation profile driven by additive depletion and soot accumulation. Cluster 1 is influenced by chemical degradation markers like TAN and oxidation, along with wear indicators such as copper and nickel, reflecting both acid formation and metallic wear. Cluster 2 shows strong associations with TAN, lead, and sulfation, indicating a more advanced stage of lubricant deterioration with significant acidification and contamination. Finally, Cluster 3, although represented by a single sample, stands out for high levels of nitration and oxidation, pointing to severe thermal and combustion-related degradation.

Figure 14 reinforces the distinctions previously identified in the degradation profiles. From the analysis of Fig. 14, it is worth highlighting the presence of extreme values in Cluster 3 for variables such as Calcium (Ca), Lead (Pb), Chromium (Cr), and Sulfur (S). These elevated levels indicate an atypical case of severe degradation, possibly associated with significant contamination, wear of metallic components, and a potential failure in combustion control or the filtration system.

Evaluation: predictive assessment of cluster membership using a perceptron model

Following the analysis of the variables most influential in defining cluster separation through PCA loadings, a supervised classification approach was implemented to gain a deeper understanding of how the clusters were formed. Although PCA and K-Means clustering enabled the segmentation of oil samples into four distinct groups, understanding the underlying criteria of this separation benefited from an additional predictive assessment.

The model used is based on a Single-Layer Perceptron (SLP) trained to predict data points across different clusters. After training, the model is interpreted through the connection weight approach, which identifies the most influential input variables in defining each cluster. While the approach is not explicitly detailed in²⁴, the theoretical insights provided by the study regarding the stability of weight solutions reinforce confidence in interpreting weight configurations meaningfully. Furthermore, the study by Singh & Banerjee²⁴ highlights the applicability and effectiveness of simple perceptrons in classification tasks, particularly when the data is linearly separable, reinforcing the usefulness of the SLP for interpretable analysis. The Connection Weight Approach is later developed and applied by other authors, such as^25,26.

To ensure that the predictive assessment of the perceptron model was statistically reliable, the model was evaluated using K-fold cross-validation. This procedure yielded a mean accuracy of 94.85% (standard deviation 4.22%) and a mean F1-score of 86.81%, demonstrating consistent performance across multiple training–validation splits. The classification report indicated strong performance for clusters with larger sample sizes (0, 1, and 2), while Cluster 3 - represented by only one sample - is not statistically meaningful, and its low F1-score reflects sample imbalance rather than model deficiency. These results confirm that the model generalizes well to unseen data and that the extracted feature relevance is supported by robust validation metrics.

For interpretability purposes, the relevance of each variable was assessed using the absolute normalized weights of the perceptron, allowing clear identification of the features most influential in distinguishing between clusters. This approach complements the PCA results and enhances the understanding of the degradation mechanisms characterizing each group.

Figure 15 shows the variables with the highest weight in each cluster, based on the coefficients of the perceptron. This allows the identification of the main factors that characterize and distinguish the clusters formed in the previous analysis.

The analysis reveals that each cluster represents a distinct oil degradation profile. Cluster 0 is marked by chemical degradation. Cluster 1 shows strong signs of mechanical wear, with high levels of wear metals and fuel presence. Cluster 2 indicates minimal degradation, dominated by additive-related elements and stable viscosity - suggesting early-life or well-maintained oil. Cluster 3 stands out for signs of coolant contamination and combustion by-product accumulation, reflecting severe and mixed degradation. Overall, the clusters illustrate different oil aging and wear regimes, shaped by specific combinations of chemical, physical, and mechanical factors.

The analysis of means and standard deviations in Table 8 deepens the understanding of the clusters by objectively highlighting which variables reach extreme levels in each group. While the means indicate the degree of oil and engine degradation, the standard deviations reveal the internal consistency of the clusters, indicating whether their behaviors are homogeneous or variable.

Table 8 Mean and standard deviation of key variables per cluster.

Full size table

The data presented in Table 8 provides a comprehensive snapshot of the fleet’s health from the engine perspective, effectively capturing the current state of oil degradation and wear across different operational profiles.

The analysis reinforces and complements the patterns previously identified in the clusters. Cluster 0 shows signs of chemical degradation and combustion-related contamination, as indicated by high values of viscosity, oxidation, nitration, and metal wear (Fe).

Cluster 1 reveals severe internal component wear, with very high levels of copper, iron, and other metals, suggesting mechanical failure. Notably, there is also the presence of diesel fuel contamination (in only one vehicle), indicating fuel infiltration into the engine. Additionally, water contamination was detected, which is critical because the simultaneous presence of both can accelerate oil degradation processes, reduce lubricant viscosity, compromise the lubricating film, and promote internal corrosion of engine components. This combination poses a significant risk of accelerated wear and potential engine failure if not addressed promptly.

Cluster 2 demonstrates preservation of additives (P, Zn, Ca, Mo) and low wear, representing systems in good condition or at an early stage of use.

In contrast, Cluster 3 displays extreme values of sulfur, calcium, zinc, and other metals, indicating an atypical case of severe and multiple degradation processes, likely involving contamination and critical failures. Although this cluster contains only a single case (thus no standard deviation), the values suggest highly anomalous behavior. Thus, we can consider cluster 3 as an outlier.

Discussion

The analysis conducted on the Lukoil 10W40 lubricating oil, using statistical methods and unsupervised learning techniques, enabled the identification of robust degradation patterns associated with varying operational conditions in urban bus engines. By focusing on a single oil formulation, exogenous variability was significantly reduced, allowing for a more rigorous assessment of the impact of operational variables and internal mechanisms of wear and lubricant aging.

The application of Principal Component Analysis revealed that most of the total variance in the dataset could be explained by a reduced number of latent components, primarily influenced by variables related to chemical degradation of the oil (TAN, oxidation, nitration), metallic wear (Fe, Cu, Ni, Pb), and additive depletion (Zn, P, S, Ca). The loading structure showed that the first principal component (PC1) was strongly associated with wear and additive-related elements, while PC2 represented acid oxidation processes and metal contamination. These variables were instrumental in driving the formation of the clusters through K-Means.

The segmentation into four clusters allowed the identification of distinct operational profiles.

Cluster 0 included only Cummins engines operating under urban conditions. These engines are subject to frequent stop-and-go cycles and extended idle times, which are known to influence lubricant degradation differently from interurban usage. This suggests that operational context plays a critical role in the condition and behavior of the lubricant.

Clusters 1 and 2 consisted of buses operating under interurban conditions. However, important distinctions emerged: Cluster 1 grouped only new engines, mostly from 2022 to 2023, many of which were likely in their break-in period. These engines exhibited higher levels of wear metals and debris, which is consistent with the expected behavior during early engine operation. Specifically, elevated copper concentrations were observed, indicating bearing wear and potential bronze bushing break-in activity. Copper levels in these new engines ranged significantly above normal operational thresholds, suggesting active surface conditioning of engine components. Several engines in this cluster were still within their first or second oil change intervals. Given the high copper concentrations and elevated debris levels characteristic of the break-in period, it is advisable to significantly reduce oil change intervals during this critical phase - potentially shortening intervals to 50–75% of standard recommendations to prevent accelerated wear and ensure optimal engine longevity.

In contrast, Cluster 2 represented older engines (pre-2022), which had undergone more regular oil change cycles and demonstrated a more stable lubricant degradation profile. This cluster appears to reflect standard, long-term operational behavior.

Cluster 3, although initially grouped as a cluster, displayed markedly different characteristics. It corresponded to a single, older bus with high levels of contamination and lubricant degradation due to a known oil leak. Given its unique condition, it is more appropriate to consider Cluster 3 as an outlier rather than a true operational cluster.

These insights highlight the ability of clustering methods to reveal patterns not immediately visible through individual variables. Most notably, the clustering differentiated operational stages (new vs. mature engines), usage types (urban vs. interurban), and even flagged a mechanical fault (oil leak).

Based on these operational patterns, the results suggest that optimized oil-change strategies could reduce unnecessary replacements by approximately 10–20% in stable interurban engines, while new engines in the break-in period may benefit from temporarily shortening intervals by 25–50% to prevent premature wear.

The results should be interpreted considering statistical limitations such as sample-size constraints and partial sample dependence, since multiple samples derive from the same engines. These factors may influence cluster generalization.

These observations are consistent with recent findings reported in tribological and fleet-monitoring studies^15,21 which similarly emphasize the relevance of operational context in lubricant degradation behavior.

Compared to previous studies, the multivariate approach captured operational differences (urban vs. interurban) that are not explained by mileage alone. Future work should integrate telematics and fuel-dilution data to enhance predictive power.

Conclusion

This study aimed to analyze the behavior of Lukoil 10W40 lubricating oil in passenger bus engines through the application of multivariate statistical techniques and unsupervised learning methods. The methodological contribution of this work lies in the combined use of Principal Component Analysis, K-Means clustering, and Connection Weight Approach, which enabled a structured identification of degradation patterns not detectable through traditional univariate indicators.

The segmentation of the data into four clusters revealed clearly defined profiles.

The results reinforce the importance of adopting Condition-Based Maintenance (CBM) strategies, relying on real oil condition indicators rather than generic metrics such as total mileage. Correlation analysis showed that mileage had a weak or nonexistent relationship with wear metal concentrations, contradicting commonly used assumptions in maintenance planning.

To support CBM implementation, the findings indicate specific recommendations: shortening oil-change intervals for new engines in the break-in phase (Cluster 1) and prioritizing the monitoring of soot and oxidation for buses operating under high-idle urban conditions (Cluster 0).

Factors such as partial oil replenishment, fuel quality, driving style, and overall engine condition proved to be more decisive in influencing oil degradation. This highlights the need for more sensitive and customized predictive approaches.

Thus, the application of methods such as those presented in this study enables a differentiated and intelligent management of lubricants, contributing to increased operational safety, reduction of premature wear, better use of technical resources, and greater economic efficiency in fleet management.

Despite the promising results, this work represents an initial level of analysis. The main limitation lies in the small dataset size, which, while sufficient for operational profile segmentation, is not yet adequate for training reliable predictive models. The use of supervised learning techniques to forecast oil condition based on the analyzed variables remains unfeasible at this stage but represents a promising direction for future research.

In summary, this pioneering analysis lays the groundwork for an innovative approach to bus maintenance, promoting greater reliability and efficiency in urban public transportation systems.

The findings support revising oil-change intervals according to chemical condition indicators rather than mileage, particularly shortening intervals for new engines (Cluster 1) and monitoring soot/oxidation in high-idle buses (Cluster 0).

Data availability

The data used in the article comes from an urban transport company in Poland and the authors of the article are not authorized to make it publicly available.

References

Gołębiowski, W., Wolak, A. & Šarkan, B. Engine oil degradation in the Real-World bus fleet test based on two consecutive operational intervals. Lubricants 12 (3). https://doi.org/10.3390/lubricants12030101 (2024).
Lenza, T. L. L., Ruggiero, A., Senatore, A. & Siano, P. Some results on the used lubricants analysis of urban bus diesel engines. Adv. Transp. 16, 541–550 (2004).
Google Scholar
Gołębiowski, W., Zając, G., Sejkorová, M. & Wolak, A. Assessment of oil change intervals in urban buses based on the selected physicochemical properties of used engine oils. Combust. Engines. 196 (1), 15–23. https://doi.org/10.19206/CE-169807 (2024).
Article Google Scholar
Raposo, H., Farinha, J. T., Fonseca, I. & Ferreira, L. A. Condition monitoring with prediction based on diesel engine oil analysis: A case study for urban buses. Actuators 8 (1). https://doi.org/10.3390/act8010014 (2019).
DIONLENO BORGES SCHUTZ & UTILIZAÇÃO DA ANÁLISE DE ÓLEO LUBRIFICANTE COMO FERRAMENTA DA ENGENHARIA DE MANUTENÇÃO Universidade Federal do Rio Grande do Sul,. 2008 - Coaching d’équipe.pdf%0Ahttp://journal.um-surabaya.ac.id/index.php/JKM/article/view/2203%0Ahttp://mpoc.org.my/malaysian-palm-oil-industry/%0 https://doi.org/10.1080/23322039.2017 (2022).
Rodrigues, J., Costa, I., Farinha, J. T., Mendes, M. & Margalho, L. Predicting motor oil condition using artificial neural networks and principal component analysis | prognozowanie Stanu Oleju Silnikowego Za pomocą Sztucznych sieci neuronowych i analizy składowych głównych. Eksploatacja I Niezawodnosc. 22 (3), 440–448 (2020).
Article Google Scholar
Gołębiowski, W., Wolak, A. & Zając, G. Preventive maintenance in urban public transport: the role of engine oil analysis. Sci. Rep. 14 (1), 30894. https://doi.org/10.1038/s41598-024-81728-w (2024).
Wolak, A. & Krasodomski, W. Reducing oil waste through Condition-Based maintenance: A diagnostic study using FTIR and viscosity monitoring. Sustain. (Switzerland). 17 (18). https://doi.org/10.3390/su17188214 (2025).
Rappaport, S. T., Ferner, M. D., Hecker, L. S., Tierney, T. B. & International, S. A. E. Evaluation of API/ILSAC GF-4 Oil Life in Today’s US Fleet, In 2008 SAE International Powertrains, Fuels and Lubricants Congress, https://doi.org/10.4271/2008-01-1740 (2008).
Karanović, V. V., Jocanović, M. T., Wakiru, J. M. & Orošnjak, M. D. Benefits of lubricant oil analysis for maintenance decision support: A case study. IOP Conf. Ser. Mater. Sci. Eng. 393 (1). https://doi.org/10.1088/1757-899X/393/1/012013 (2018).
Tormos, B., Olmeda, P., Gómez, Y. & Galar, D. Monitoring and analysing oil condition to generate maintenance savings: a case study in a CNG engine powered urban transport fleet, Insight - Non-Destructive Testing and Condition Monitoring, vol. 55, no. 2, pp. 84–87, https://doi.org/10.1784/insi.2012.55.2.84 (2013).
Macian Martinez, V., Tormos Martinez, B. V., Gomez, Y. & Bermudez Tamarit, V. Revisión Del Proceso de La degradación Em Los aceites lubricantes Em motores de gas natural comprimido y diesel. Dyna Engenieria E Ind. 88 (3), 49–58. https://doi.org/10.6036/5077 (2013).
Article Google Scholar
Macián, V., Tormos, B., Olmeda, P. & Gómez, Y. A. Findings from a fleet test on the performance of two engine oil formulations in automotive CNG engines. Lubr. Sci. 27 (1), 15–28. https://doi.org/10.1002/ls.1248 (2015).
Macian, V., Tormos, B., Miró, G. & Pérez, T. Assessment of low-viscosity oil performance and degradation in a heavy duty engine real-world fleet test, Proceedings of the Institution of Mechanical Engineers, Part J: Journal of Engineering Tribology, vol. 230, no. 6, pp. 729–743, https://doi.org/10.1177/1350650115619612. (2015).
Raposo, H. & Galar, D. Prediction condition based on oil analysis- a case study, https://doi.org/10.1016/j.triboint.2019.01.041 (2019).
Nagy, A. L. et al. Rapid fleet condition analysis through correlating basic vehicle tracking data with engine oil ft-ir spectra. Lubricants 9 (12). https://doi.org/10.3390/lubricants9120114 (2021).
Omiya, T., Hanyuda, K. & Nagatomi, E. Predicting engine oil degradation across diverse vehicles and identifying key factors. Mech. Syst. Signal. Process. 229, 112524. https://doi.org/10.1016/j.ymssp.2025.112524 (2025).
Article Google Scholar
Viana de Sousa, E. H. Análise Preditiva a Partir Da caraterização Das Emissões Gasosas E Do óleo Lubrificante Em Frotas Com motorização a Diesel ( Universidade Federal do Rio Grande do Norte, 2010).
Kimura, R. Uso da Técnica de Análise de Óleo Lubrificante em Motores Diesel Estacionários, Utilizando-se Misturas de Biodiesel e Diferentes Níveis de Contaminação do Lubrificante, Athena.Biblioteca.Unesp.Br, p. 128, http://www.athena.biblioteca.unesp.br/exlibris/bd/bis/33004099082P2/2010/kimura_rk_me_ilha.pdf (2010).
Domínguez-García, S., Béjar-Gómez, L., López-Velázquez, A., Maya-Yescas, R. & Nápoles-Rivera, F. Maximizing lubricant life for internal combustion engines. Processes 10 (10). https://doi.org/10.3390/pr10102070 (2022).
Ramirez Camba, R., Garcia Garcia, C., Garcia Tobar, M. & Merchan, J. F. An integrated methodological approach for interpreting used oil analysis in diesel engines. Lubricants 13 (4). https://doi.org/10.3390/lubricants13040169 (2025).
Nguyen, V. T., Furch, J. & Koláček, J. Using multiple linear regression to predict engine oil life. Sci. Rep. 15, 33585. https://doi.org/10.1038/s41598-025-18745-w (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Wolak, A., Krasodomski, W. & Zając, G. FTIR analysis and monitoring of used synthetic oils operated under similar driving conditions. Friction 8 (5), 995–1006. https://doi.org/10.1007/s40544-019-0344-9 (2020).
Article CAS Google Scholar
Coetzee, F. M. & Stonick, V. L. On the uniqueness of weights in single-layer perceptrons. IEEE Trans. Neural Netw. 7 (2), 318–325. https://doi.org/10.1109/72.485635 (1996).
Article ADS CAS PubMed Google Scholar
Julian, D., Olden, D. A. & Jackson Illuminating the ‘black box’: a randomization approach for understanding variable contributions in artificial neural networks, Ecol. Modell., https://doi.org/10.1016/S0304-3800(02)00064-9 (2002).
Guiné, R. P. F., Matos, S., Gonçalves, F. J., Costa, D. & Mendes, M. Evaluation of phenolic compounds and antioxidant activity of blueberries and modelization by artificial neural networks. Int. J. Fruit Sci. 18 (2), 199–214. https://doi.org/10.1080/15538362.2018.1425653 (2018).
Article Google Scholar

Download references

Funding

The publication was funded by appropriations from the Faculty of Production Engineering, University of Life Sciences in Lublin, to maintain research potential.

Author information

Authors and Affiliations

Polytechnic University of Coimbra, Coimbra Institute of Engineering, Rua Pedro, Nunes, Coimbra, 3030-199, Coimbra, Portugal
Margarida Oliveira Duarte, Luís Melo Margalho, Mateus Mendes & José Manuel Torres Farinha
RCM2+, Polytechnic University of Coimbra, Rua Pedro, Nunes, Coimbra, 3030-199, Portugal
Margarida Oliveira Duarte, Luís Melo Margalho, Mateus Mendes & José Manuel Torres Farinha
Department of Power Engineering and Transportation, University of Life Sciences in Lublin, Gleboka 28, Lublin, 20-612, Poland
Wojciech Gołębiowski
Institute of Systems and Robotics, Department of Electrical and Computer Engineering, ISR, University of Coimbra, Coimbra, 3030-290, Portugal
Mateus Mendes
Department of Road and Urban Transport, Faculty of Operation and Economic of Transport and Communications, University of Zilina, Žilina, Slovakia
Branislav Šarkan

Authors

Margarida Oliveira Duarte
View author publications
Search author on:PubMed Google Scholar
Luís Melo Margalho
View author publications
Search author on:PubMed Google Scholar
Wojciech Gołębiowski
View author publications
Search author on:PubMed Google Scholar
Mateus Mendes
View author publications
Search author on:PubMed Google Scholar
José Manuel Torres Farinha
View author publications
Search author on:PubMed Google Scholar
Branislav Šarkan
View author publications
Search author on:PubMed Google Scholar

Contributions

M.D. Data Analysis, Manuscript Writing, Research.L.M. Manuscript Review, Supervision.W.G. Conceptualization, Methodology, Manuscript Review.M.M. Supervision, Conceptualization, Methodology, Manuscript Review.T.F. Supervision, Conceptualization, Methodology, Manuscript Review.B.S. Data Provision, Manuscript Review.

Corresponding author

Correspondence to Margarida Oliveira Duarte.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Duarte, M.O., Margalho, L.M., Gołębiowski, W. et al. Monitoring the condition of city bus engines by analysing used oil using PCA and K-Means clustering. Sci Rep 16, 9392 (2026). https://doi.org/10.1038/s41598-026-39045-x

Download citation

Received: 31 May 2025
Accepted: 02 February 2026
Published: 17 February 2026
Version of record: 19 March 2026
DOI: https://doi.org/10.1038/s41598-026-39045-x

Monitoring the condition of city bus engines by analysing used oil using PCA and K-Means clustering

Subjects

Abstract

Similar content being viewed by others

Degradation of anti-wear additives and tribological properties of engine oils at extended oil change intervals in city buses

Optimization, characterization, and GC-MS analysis of recycled used engine oil by solvents and adsorbent extraction

Preventive maintenance in urban public transport: the role of engine oil analysis

Introduction

Literature review

Methodology and experimental setup

Context and data source

Correlation analysis

Methods

Analysis of data and results

Data understanding: analysis of physicochemical degradation indicators

Data understanding: evolution of wear metal indicators with oil mileage

Modelling: feature extraction using principal component analysis

Modeling: cluster formation and interpretation via K-Means

Evaluation: predictive assessment of cluster membership using a perceptron model

Discussion

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Degradation of anti-wear additives and tribological properties of engine oils at extended oil change intervals in city buses

Optimization, characterization, and GC-MS analysis of recycled used engine oil by solvents and adsorbent extraction

Preventive maintenance in urban public transport: the role of engine oil analysis

Introduction

Literature review

Methodology and experimental setup

Context and data source

Correlation analysis

Methods

Analysis of data and results

Data understanding: analysis of physicochemical degradation indicators

Data understanding: evolution of wear metal indicators with oil mileage

Modelling: feature extraction using principal component analysis

Modeling: cluster formation and interpretation via K-Means

Evaluation: predictive assessment of cluster membership using a perceptron model

Discussion

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links