Introduction

Gas-polyethylene (PE) pipes are widely utilized in gas transmission and distribution systems due to their excellent corrosion resistance and flexibility. However, critical quality indicators, including wall thickness, inner diameter, outer diameter, concentricity, and ovality, are highly susceptible to variations in process parameters (e.g., barrel temperature and head pressure). These variations impact installation quality and construction efficiency and reduce the pipe’s pressure-bearing capacity, posing potential risks to system safety1,2,3. Hence, maintaining a precise and reliable measurement system in gas-polyethylene (PE) pipe production is paramount. Measurement System Analysis (MSA) is a crucial tool for evaluating the reliability of measurement systems. Traditional univariate MSA assesses a single quality indicator (e.g., height or weight) and employs statistical techniques such as analysis of variance (ANOVA) to evaluate the measurement system’s performance from multiple perspectives. While this method is simple and efficient, making it highly effective in single-dimensional quality control4, its applicability diminishes as manufacturing processes become increasingly complex. The intricate interactions among multidimensional quality indicators hinder univariate MSA from fully capturing the measurement system’s performance, rendering it insufficient for addressing the demands of complex manufacturing scenarios5.

Multivariate Measurement System Analysis has become a prominent research focus6,7. Currently, the predominant methodologies for multivariate measurement system analysis are grounded in dimensionality reduction techniques, with principal component analysis (PCA) and projection pursuit (PP) being among the most commonly used approaches. PCA reduces data dimensionality by extracting principal components through a linear transformation, effectively capturing the most significant data features. He et al8. proposed an online multivariate measurement system analysis method based on PCA, in which multivariate measurement data are transformed into independent principal components to assess the performance of testing instruments. This approach significantly enhances the efficiency of multidimensional data analysis. However, PCA assumes a linear relationship among variables, making it less suitable for complex nonlinear datasets. Gewers et al9. highlight that PCA may result in substantial information loss when applied to data exhibiting strong nonlinear interactions. The PP method, on the other hand, identifies the optimal projection direction by maximizing the variability in the data, making it well-suited for datasets with non-normal distributions or nonlinear structures. Li Xiaoping et al10. introduced a multivariate measurement system capability analysis method based on projection pursuit, which accounts for subjective preferences and correlations among quality characteristics, providing a more accurate assessment framework for complex measurement systems. However, PP’s sensitivity to parameter selection and high computational complexity limit its practical application in industrial settings11. While both PCA and PP substantially simplify multidimensional data analysis, their inherent limitations—particularly the potential loss of critical information during the dimensionality reduction process—constrain their ability to evaluate multivariate measurement systems comprehensively.

With the continuous advancement of dimensionality reduction techniques, nonlinear dimensionality reduction methods have become essential tools for analyzing high-dimensional complex data. In recent years, nonlinear algorithms such as t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) have demonstrated notable advantages in capturing nonlinear relationships and handling non-normally distributed data12. t-SNE preserves the local neighborhood structure of data by minimizing the Kullback-Leibler divergence between high-dimensional and low-dimensional distributions, making it widely applicable in fields such as life prediction13, fault diagnosis14, and data visualization15. However, t-SNE has several limitations, including weak global structure preservation, susceptibility to blurred class boundaries in multi-clustered data, high computational complexity, and sensitivity to hyperparameters, which may compromise its stability and efficiency when processing large-scale datasets16. In contrast, UMAP offers superior dimensionality reduction performance and computational efficiency. Leveraging Riemannian geometry and algebraic topology, UMAP constructs a high-dimensional neighborhood graph and optimizes its low-dimensional mapping, thereby preserving local data structures while maintaining global relationships17. UMAP more effectively delineates the boundaries between different quality characteristic clusters in industrial multivariate production data while minimizing local detail loss. Compared to t-SNE, UMAP exhibits lower computational complexity and greater efficiency in handling large-scale datasets17. Moreover, UMAP demonstrates reduced sensitivity to hyperparameter selection, often yielding stable results with default settings. Within the context of smart manufacturing, UMAP has been leveraged to analyze multidimensional quality characteristic data, offering superior dimensionality reduction efficiency and visualization performance compared to t-SNE18. Despite its advantages in processing nonlinear and non-normally distributed data, UMAP has an inherent limitation: it assigns equal weight to all features by default, overlooking the differential significance of various quality characteristics in measurement system analysis. In multivariate measurement system analysis, individual quality characteristics may exert differing influences on system performance, particularly in the presence of significant interactions. Assigning appropriate weights to features can enhance the accuracy of the analysis. Consequently, this limitation of UMAP constrains its applicability in multivariate measurement system analysis19.

In light of the limitation of the UMAP algorithm, which assigns uniform weights to all quality characteristics by default, this paper proposes an improved UMAP algorithm with variable weights. The proposed algorithm optimizes and assigns adaptive weights to multiple quality characteristics by introducing a weight optimization mechanism. Moreover, traditional PCA algorithms exhibit limited structural preservation and applicability when dealing with manufacturing data with significant nonlinear relationships and complex interactions. In addition, existing multivariate MSA methods generally lack a unified and integrated analytical framework, hindering the implementation of an end-to-end workflow encompassing data preprocessing, nonlinear dimensionality reduction, and system capability evaluation in real-world manufacturing contexts. The fragmented structure of such methods compromises the analysis’s practicality and interpretability. Based on this, an integrated analysis model for multiple measurement systems based on Variable-Weight Uniform Manifold Approximation and Projection (VUMAP)is developed. The key contributions of this study are outlined as follows:

  1. (1)

     A dynamic assignment method based on the VUMAP algorithm is proposed to address the difference in the importance of quality characteristics. The method combines the K - Means + + clustering technique and the Particle Swarm Optimization (PSO) technique, divides quality characteristics into categories through spatial similarity clustering, and optimizes the weight allocation based on the spatial distribution characteristics. This way, it can characterize the actual contribution of quality characteristics more accurately than the traditional uniform weighting method.

  2. (2)

    An improved method based on VUMAP is proposed to address the problem of the loss of nonlinear information during data dimensionality reduction. This method achieves the differentiated processing of quality characteristics by optimizing the intrinsic topology of high-dimensional data and integrating it with the dynamic empowerment mechanism. To effectively preserve the nonlinear relationships and complex data features, and mitigate the information distortion and error accumulation in traditional methods (e.g., PCA). Experiments demonstrate that VUMAP outperforms traditional methods in maintaining the structural integrity of data and characterizing the interaction of quality characteristics.

  3. (3)

    Addressing the limitations of the traditional multivariate measurement system analysis model, an integrated analysis framework is proposed, which comprises three core modules: data preprocessing and correlation analysis, dynamic assignment VUMAP dimensionality reduction, and multivariate capability assessment. By integrating anomaly detection and multi-dimensional correlation analysis, the model can accurately identify the correlation characteristics of quality features, utilize the improved VUMAP algorithm for preserving the non-linear data structure, and establish an assessment system that integrates visualization and quantitative indicators. The model overcomes the limitations of analyzing complex data structures in traditional methods, enhances the comprehensiveness and accuracy of measurement system analysis through modular synergy, and offers a reliable quality management tool for intelligent manufacturing.

Methods

Uniform manifold approximation and projection (UMAP)

UMAP (Uniform Manifold Approximation and Projection)17,20,21 is an emerging dimensionality reduction algorithm aimed at preserving both global and local structural characteristics of high-dimensional data as much as possible. The algorithm integrates algebraic topology and Riemannian geometry to achieve high-to-low-dimensional mapping by establishing a weighted network framework in high-dimensional space based on an optimization objective. The UMAP algorithm exhibits high result stability and strong adaptability; however, it cannot assign variable weights to different features of the input data, which constrains its application in complex industrial measurements. In this paper, the term low-dimensional space refers to the target projection space resulting from the dimensionality reduction process. In contrast, low-dimensional embeddings denote the coordinate representations of the sample points in this space. The primary procedures of the algorithm include:

Construct a neighborhood graph for high-dimensional data

Given a high-dimensional input dataset \(\:X=\left\{{x}_{1},{x}_{2},\cdots\:,{x}_{N}\right\}\), the neighborhood of any sample point \(\:{x}_{i}\) is defined as follows:

$$\:{\rho\:}_{i}={min}\left\{{d}_{E}({x}_{i},{x}_{j})|1\le\:j\le\:k,{d}_{E}({x}_{i},{x}_{j})>0\right\}$$
(1)
$$\:{\sigma\:}_{i}=\frac{1}{k-1}\sum\:_{j=1}^{k}\text{max}\left(0,{d}_{E}({x}_{i},{x}_{j})-{\rho\:}_{i}\right)$$
(2)

Where \(\:k\) is the number of neighbors, \(\:{d}_{E}({x}_{i},{x}_{j})\) represents the Euclidean distance between \(\:{x}_{i}\) and \(\:{x}_{j}\), \(\:{\rho\:}_{i}\) is the distance to the nearest neighbor, and \(\:{\sigma\:}_{i}\) is the scale factor.

Specifically, a neighbor refers to a sample point close to the target point in high-dimensional space. At the same time, a neighborhood denotes the collection of such neighbors, typically determined by the number of nearest neighbors \(\:k\). This concept is illustrated in Fig. 1.

Fig. 1
figure 1

Illustration of “neighbor” and “neighborhood” in UMAP.

Define the similarity probability in high-dimensional space:

$$\:{p}_{ij}=exp\left(-\frac{{max}(0,{d}_{E}({x}_{i},{x}_{j})-{\rho\:}_{i})}{{\sigma\:}_{i}}\right)$$
(3)

Construct an undirected weighted neighborhood graph with edge weights determined by symmetrized similarity probabilities:

$$\:{w}_{ij}={p}_{ij}+{p}_{ji}-{p}_{ij}{p}_{ji}$$
(4)

Define the similarity distribution in the low-dimensional space

In the low-dimensional space, the similarity between embedded sample points is modeled by the following probability distribution:

$$\:{q}_{ij}=\frac{1}{1+a\left|\right|{y}_{i}-{y}_{j}|{|}^{2b}}$$
(5)

where \(\parallel y_{i}-{y}_{j} \parallel\) represents the Euclidean distance between sample points \(\:{y}_{i}\) and \(\:{y}_{j}\) in the low-dimensional space, \(\:a=1.929\) and \(\:b=0.7915\) are empirically determined hyperparameters17.

The objective function is optimized by minimizing the difference in entropy between the distributions in high- and low-dimensional spaces:

$$\:C={\sum\:}_{i\ne\:j}{w}_{ij}{log}(\frac{{w}_{ij}}{{q}_{ij}})+(1-{w}_{ij}\left){log}(\frac{1-{w}_{ij}}{1-{q}_{ij}}\right)$$
(6)

Optimize low-dimensional embeddings and result output

The gradient descent method minimizes the objective function \(\:C\) and iteratively updates the positions of the low-dimensional embedding points.

The gradient of the objective function is computed, incorporating both the attractive and repulsive forces, with the attractive force defined as:

$$\:\nabla\:{C}_{attract\text{ive}}=\frac{-2ab\left|\right|{y}_{i}-{y}_{j}|{|}_{2}^{2\left(b-1\right)}}{1+\left|\right|{y}_{i}-{y}_{j}|{|}_{2}^{2}}\omega\:\left({x}_{i},{x}_{j}\right)\left({y}_{i}-{y}_{j}\right)$$
(7)

Here, \(\:a\) and \(\:b\) represent the hyperparameters. The repulsive force is expressed as:

$$\:\nabla\:{C}_{repulsive}=\frac{2b\left[1-\omega\:({x}_{i},{x}_{j})\right]({y}_{i}-{y}_{j})}{(\epsilon\:+||{y}_{i}-{y}_{j}|{|}_{2}^{2}\left)\right(1+a\left|\right|{y}_{i}-{y}_{j}\left|{|}_{2}^{2b}\right)}$$
(8)

Among these, \(\:\epsilon\:\) is a tiny constant that prevents the denominator from becoming zero.

In summary, the gradient of the objective function \(\:C\) with respect to the embedding point \(\:{y}_{i}\) is composed of an attractive force and a repulsive force. The total gradient expression is:

$$\:\frac{\partial\:C}{\partial\:{y}_{i}}=\nabla\:{C}_{attractive}+\nabla\:{C}_{repulsive}$$
(9)

Update the position of the point based on the gradient:

$$\:{y}_{i}^{(t+1)}={y}_{i}^{\left(t\right)}-\eta\:\frac{\partial\:C}{\partial\:{y}_{i}}$$
(10)

Here, \(\:\eta\:\) represents the learning rate.

The optimization process continues iteratively until the objective function \(\:C\) converges or the predefined iteration limit is attained, resulting in an embedding in a lower-dimensional space.

Variable-weight uniform manifold approximation and projection (VUMAP)

VUMAP integrated strategy and improvement framework

The uniform manifold approximation and projection (UMAP) algorithm employs Euclidean distance as the metric for dimensionality reduction. It assigns uniform weights to all data characteristics, failing to fully leverage the distinctions in data distributions. We propose an enhanced method to address this limitation: variable-weight uniform manifold approximation and projection (VUMAP). This approach introduces the Minkowski distance, dynamic weighting, and an optimization strategy to achieve effective dimensionality reduction of multivariate data, enhancing the preservation of data characteristics and overall reduction performance. The specific improvement framework is shown in Fig. 2.

Fig. 2
figure 2

VUMAP improvement framework diagram.

First, the Minkowski distance is employed as the distance metric to yield greater relative distance variations than Euclidean distance19. Then, based on the distribution of these distances, a weighting procedure is applied to enhance the discrimination between data instances, thereby improving the retention of high-dimensional information and enhancing the effectiveness of the dimensionality reduction process. Additionally, the Minkowski distance can accommodate variations in scale and statistical characteristics across features by tuning the order parameter \(\:p\), thus exhibiting enhanced representational capacity.

To objectively categorize the pairwise distances among samples, all unique distance values were extracted from the high-dimensional Minkowski distance matrix and clustered into three groups using the K-Means + + algorithm19,23,24. The resulting clusters were then ordered in ascending order of their centroid values, and the midpoints between adjacent centroids were calculated to determine the distance thresholds for classification(\(\:{\tau\:}_{1}\:and\:{\tau\:}_{2}\)).

In the VUMAP dimensionality reduction algorithm, weight allocation optimization relies on the particle swarm optimization algorithm25. PSO adaptively optimizes the weights assigned to the three distance categories, thereby minimizing structural discrepancy between the weighted and original distance matrices. The mathematical details of PSO, including the update equations and optimization objective, are presented in the next subsection (Eqs. 1416).

Compared to the original UMAP algorithm, which uses Euclidean distance by default and assigns equal weights to all features, the VUMAP framework proposed in this paper introduces several key improvements in three major aspects. First, the Minkowski distance addresses the limitations of Euclidean distance in handling feature scale differences by amplifying relative distance disparities, thereby improving the discriminative capacity in similarity computation. Second, the K-Means + + clustering algorithm automatically partitions sample distances into three tiers (“near,” “medium,” and “far”), thereby replacing manually defined thresholds and determining two critical thresholds (\(\:{\tau\:}_{1}\:and\:{\tau\:}_{2}\:\)) to guide subsequent weighting. Finally, the PSO algorithm adaptively optimizes the weights for different distance levels by minimizing the structural error (MSE) between the weighted and original distances, ensuring that the resulting embedding faithfully represents the data structure. The synergistic integration of these three components significantly enhances VUMAP’s capabilities in global structure preservation, feature weighting, and preservation of variable interaction information, making it suitable for precise analysis of complex and diverse measurement systems.

VUMAP overall process

The flowchart of the VUMAP algorithm is illustrated in Fig. 3. The key steps are outlined as follows:

Step 1: Data preprocessing

The high-dimensional dataset is normalized to mitigate the impact of varying characteristic scales on the dimensionality reduction process.

$$\:{x}_{i,u}^{{\prime\:}}=\frac{{x}_{i,u}-{min}({X}_{u})}{{max}({X}_{u})-{min}({X}_{u})}$$
(11)

Here, \(\:{x}_{i,u}\) denotes the value of the u-th characteristic of the sample \(\:{x}_{i}\); while \(\:\text{m}\text{i}\text{n}\left({X}_{u}\right)\) and \(\:\text{m}\text{a}\text{x}\left({X}_{u}\right)\) represent the minimum and maximum values of the u-th characteristic, respectively.

Step 2: Minkowski distance

To better differentiate features, the Minkowski distance is employed in place of the Euclidean distance, as follows:

$$\:{d}_{M}({x}_{i},{x}_{j})={\left({\sum\:}_{u=1}^{d}|{x}_{i,u}^{{\prime\:}}-{x}_{j,u}^{{\prime\:}}{|}^{p}\right)}^{\frac{1}{p}}$$
(12)

Here, \(\:p\) is the order parameter for the Minkowski distance, and the significance of dimensional characteristics is regulated by adjusting \(\:p\).

Step 3: Dynamic weighting based on distribution characteristics

Weights are assigned to distances based on their distribution characteristics. K-Means + + is applied to cluster the distances into three categories: near, medium, and far. These distances are then weighted, and the PSO algorithm is employed to determine the optimal weights for each category:

$$\:{w}_{ij}=\left\{\begin{array}{cc}{w}_{1}\times\:{d}_{M}({x}_{i},{x}_{j}),&\:{d}_{M}<{\tau\:}_{1}\\\:{w}_{2}\times\:{d}_{M}({x}_{i},{x}_{j}),&\:{\tau\:}_{1}\leqslant{d}_{M}<{\tau\:}_{2}\\\:{w}_{3}\times\:{d}_{M}({x}_{i},{x}_{j}),&\:{d}_{M}\geqslant {\tau\:}_{2}\end{array}\right.$$
(13)

Here, \(\:{\tau\:}_{1}\) and \(\:{\tau\:}_{2}\) are thresholds that differentiate near, medium, and far distances, defining the three categories; \(\:{w}_{1}(>1)\) is the weight for near distances; \(\:{w}_{2}(=1)\) is the weight for medium distances; \(\:{w}_{3}(<1)\) is the weight for far distances.

PSO is an optimization method based on swarm intelligence that simulates the stochastic movements of optimization vectors across the feasible domain to locate Pareto optima. Assuming that the position of the particle is \(\:x\) and its velocity is \(\:v\), the updated formulas for its position and velocity are:

$$\:{v}_{i}^{t+1}=\omega\:{v}_{i}^{t}+{c}_{1}{r}_{1}\left({p}_{i}^{t}-{x}_{i}^{t}\right)+{c}_{2}{r}_{2}\left({g}^{t}-{x}_{i}^{t}\right)$$
(14)
$$\:{x}_{i}^{t+1}={x}_{i}^{t}+{v}_{i}^{t+1}$$
(15)

Here, \(\:\omega\:,\:\:{c}_{1},{\:c}_{2}\) are weight coefficients, \(\:{r}_{1},\:{r}_{2}\) are random factors, and \(\:{p}_{i}^{t},{\:g}^{t}\) represents the individual and global optimal positions, respectively.

The optimization objective of the PSO algorithm is to minimize the structural discrepancy between the weighted Minkowski distance matrix and the original one. Structural consistency is measured by the mean square error (MSE), which is mathematically defined as follows:

$$\:{\mathcal{L}}_{MSE}=\frac{1}{N}{\sum\:}_{i,j}\left({d}_{M}^{\left(w\right)}\right({x}_{i},{x}_{j})-{d}_{M}({x}_{i},{x}_{j}){)}^{2}$$
(16)

Here, \(\:{d}_{M}^{\left(w\right)}({x}_{i},{x}_{j})\) represents the Minkowski distance after weight optimization, and \(\:N\) indicates the total number of non-redundant distance pairs.

Step 4 Dimensionality reduction and optimization

The similarity probabilities, cross-entropy objective, and gradient update in VUMAP follow the same formulations as in UMAP (Eqs. 3–10). The only modification is that Euclidean distances \(\:{d}_{E}\) are replaced by the optimized weighted Minkowski distances \(\:{d}_{M}^{\left(w\right)}\). This substitution enables VUMAP to preserve the same optimization pipeline as UMAP while improving feature weighting and structural fidelity.

Fig. 3
figure 3

VUMAP algorithm flowchart.

Model Building

Overall process of the measurement system model

In producing gas-polyethylene pipes, the complex interactions among the five quality characteristics—wall thickness, inner diameter, outer diameter, concentricity, and ovality—make it challenging for traditional measurement system analysis to assess their performance fully. This paper proposes a multivariate measurement system analysis model based on VUMAP to address this issue. The model comprises four modules: data preprocessing and correlation analysis, correlation discrimination of quality characteristics, VUMAP dimensionality reduction based on dynamic weighting, and multivariate measurement system capability evaluation. Data preprocessing reveals the linear and nonlinear relationships among quality characteristics, providing a basis for dimensionality reduction analysis. The VUMAP algorithm, based on dynamic weighting optimization, further extracts nonlinear information from the data and reduces its dimensionality. Finally, the dimensionality reduction data are visualized and quantitatively evaluated to comprehensively assess the performance of the multivariate measurement system under the influence of complex interactions. The general flowchart of the model is demonstrated in Fig. 4.

Fig. 4
figure 4

VUMAP-Based multivariate measurement system analysis flowchart.

Data preprocessing and correlation analysis

Data outlier detection and preprocessing

To maintain data integrity and enhance the reliability of subsequent analytical processes, outliers in the collected production dataset must be detected and processed. This study employs the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm26 to detect outliers in the dataset. DBSCAN identifies and removes low-density regions (outliers) by defining the density characteristics of sample points.

The specific formula description and the definition of core points are as follows:

$$\:\left|{N}_{\epsilon}\left(p\right)\right|\ge\:MinPts$$
(17)

where: \(\:{N}_{\epsilon}\left(P\right)\) is the number of points within the \(\epsilon\) radius neighborhood of \(\:P\); and \(\:MinPts\) is the density threshold.

This preprocessing step ensures the dataset has good quality consistency and provides a reliable basis for subsequent analysis.

Pearson correlation analysis

Pearson’s correlation analysis27 measures the degree of linear correlation among various quality characteristics. The correlation coefficient is calculated as follows:

$$\:{r}_{xy}=\frac{{\sum\:}_{i=1}^{n}({x}_{i}-\bar{x})({y}_{i}-\bar{y})}{\sqrt{{\sum\:}_{i=1}^{n}({x}_{i}-\bar{x}{)}^{2}}\sqrt{{\sum\:}_{i=1}^{n}({y}_{i}-\bar{y}{)}^{2}}}$$
(18)

Where: \(\:{x}_{i},{\:y}_{i}\) is the i-th sample data point; \(\:\stackrel{-}{x},\stackrel{-}{\:y}\) is the variable’s mean.

The precise analytical protocol comprised the following implementation stages: first, calculate the Pearson correlation coefficient for each pair of quality characteristics and present the results in a matrix; then, draw a heatmap, where the degree of correlation is reflected by the color intensity, to visually display the strength of the correlation between the characteristics and provide a reference for data dimensionality reduction.

Visual analysis of pairwise relationships and distributions

A pair plot is used to visualize the data and further explore the potential relationship between its distribution structure and its quality characteristics. The pair plot displays the two-dimensional scatter distribution between each pair of characteristics, and combined with the univariate distribution histograms of each characteristic on the diagonal, it is possible to determine whether there is a nonlinear correlation between the characteristics and analyze the distribution type of the data.

The data preprocessing and correlation analysis module integrates outlier detection and distribution correlation analysis methods, which are key steps in analyzing multivariate quality characteristic data. This module provides stable and reliable data support for the subsequent VUMAP dimensionality reduction analysis and lays the foundation for multivariate measurement system analysis.

Multivariate quality characteristic dimension reduction judgment and dynamic weighting dimension reduction

Correlation test for qualitative characteristics

The correlation test between multiple quality characteristics, based on the data preprocessing and correlation analysis module, determines the next step: where statistically significant inter-characteristic associations are detected, the VUMAP-based multivariate measurement system analysis model is applied; if the quality characteristics are independent and uncorrelated, the traditional univariate measurement system analysis is used.

Convergence analysis of weight allocation via PSO optimization

The PSO algorithm is employed to optimize the distance weight parameters in VUMAP to achieve an optimal weighted configuration of quality characteristics. To ensure the algorithm’s convergence and stability, the following hyperparameters were set: \(\:\omega\:=0.7298\:,\:{\:c}_{1}=1.4945\:,{\:c}_{2}=1.4945\:,\:\:iterations=200\). Additionally, to verify the weight optimization process’s stability and convergence, the convergence trajectory of the objective function (MSE) across PSO iterations is illustrated in Fig. 5.

Fig. 5
figure 5

PSO optimization convergence curve.

As shown in Fig. 5, the PSO algorithm significantly reduced the objective function value within the first 50 iterations, with the MSE decreasing to approximately 0.4, demonstrating strong global search capability during the early optimization phase. After the 85th iteration, the error stabilizes and eventually converges to a low value of approximately 0.01, indicating that the algorithm has entered a steady state. This convergence trend highlights the reliability of the weight allocation process and the robustness of the final optimization outcome, thereby providing a solid foundation for the subsequent VUMAP-based dimensionality reduction.

Multivariate measurement system competency evaluation

Data visualization

Based on the data reduced to two dimensions, a scatter plot is used to visually analyze and classify the measurement system, intuitively displaying the fluctuation and distribution characteristics of the multivariate measurement system. The specific steps are as follows: first, the reduced data is classified and labeled based on the number of measurements, the measurement part number, and the measurement personnel (factory) number; then, a scatter plot is used to visually represent the data points of different classifications in two-dimensional space. The formula for data distribution is defined as follows:

$$\:{S}_{ijk}=\left\{\left({x}_{n},{y}_{n}\right)|n\in\:category\:i,j,k\right\}$$
(19)

Where: \(\:{x}_{n},{\:y}_{n}\) are the coordinates of the nth data point in two-dimensional space; \(\:i,j,k\:\)denotes the number of measurements, the measurement part number, and the measurement personnel (factory) number, respectively.

Finally, an intuitive analysis is conducted. The scatter plot clearly illustrates the distribution of different data types in two-dimensional space. By observing the clustering, dispersion, and distribution patterns of the points, the repeatability and reproducibility of the measurement system can be qualitatively analyzed.

Following dimensional visualization through bivariate coordinate mapping, the dataset undergoes further condensation into univariate projections for subsequent MSA. These statistical process control (SPC) tools establish quantifiable evaluation criteria that systematically expose the inspection system’s variation components and operational metrics via multi-dimensional graphical representations. The synergistic integration of spatial pattern recognition in 2D visualization with probabilistic assessment in 1D MSA enables comprehensive evaluation of measurement reliability and operational validity across methodological dimensions.

Evaluation of measurement system capabilities

For data reduced to one dimension, a univariate measurement system analysis method is employed to quantitatively evaluate the measurement system’s performance, focusing on the two key indicators: %R&R and NDC.

First, analysis of variance (ANOVA) is employed to quantify the total variation \(\:{\sigma\:}_{T}\), the variation attributable to the measurement system \(\:{\sigma\:}_{MS}\), and the variation among parts \(\:{\sigma\:}_{P}\).

$$\:{\sigma\:}_{ms}=\sqrt{\frac{S{S}_{MS}}{{n}_{MS}}},{\sigma\:}_{p}=\sqrt{\frac{S{S}_{P}}{{n}_{P}}}$$
(20)
$$\:{\sigma\:}_{T}^{2}={\sigma\:}_{P}^{2}+{\sigma\:}_{MS}^{2}$$
(21)

In the formula, \(\:S{S}_{MS}\) denotes the sum of squares due to the measurement system, and \(\:{n}_{MS}\) represents its degrees of freedom; \(\:S{S}_{P}\) denotes the sum of squares among parts, and \(\:{n}_{P}\) denotes the corresponding degrees of freedom.

(1) Percentage of Repeatability and Reproducibility (%R&R)

The %R&R is a critical metric for assessing the variation in a measurement system28. It measures the proportion of the measurement system’s total error relative to the total variation. The formula is defined as:

$$\:\text{\%}R\&R=\frac{{\sigma\:}_{MS}}{{\sigma\:}_{T}}\times\:100\text{\%}$$
(22)

(2) Number of Distinct Categories (NDC)

NDC quantifies the maximum discernible categories achievable by the measurement system, where “discernible” specifically precludes dimensional misclassification between larger and smaller specimens. This parameter characterizes the metrological sensitivity threshold, representing the minimal detectable dimensional variation. The formal mathematical representation is expressed as:

$$\:NDC=\lfloor1.414\times\:\frac{{\sigma\:}_{P}}{{\sigma\:}_{MS}}\rfloor$$
(23)

In the analysis of multivariate measurement systems, repeatability and reproducibility (%R&R) and the number of distinct categories (NDC) are the core metrics for evaluating the measurement system’s performance. This paper establishes a grading standard, as shown in Table 1, to classify the measurement system’s performance accurately.

Table 1 Performance evaluation criteria for measurement systems.

Experimental verification

Experimental design and parameter setting

In producing gas-fired polyethylene (PE) pipes, the measurement system’s accuracy and stability directly determine the pipes’ quality pass rate and application performance. This experiment used polyethylene gas pipes produced in three factories (A, B, and C) as the research objects. The measurement data were analyzed and evaluated based on the VUMAP multivariate measurement system capability analysis model. The experiment employed a high-precision polyethylene gas pipe production extruder (see Fig. 6), which can stably produce gas pipes with a diameter of 161 ± 0.1 mm and exhibits excellent dimensional control and extrusion rate control capabilities.

Fig. 6
figure 6

Physical diagram of the extruder.

The measurement objects are the same batch of gas-polyethylene pipes from three factories. Ten pipes were randomly selected as experimental samples to ensure the representativeness and comprehensiveness of the data. Three cross-sections were evenly selected along the axis of each pipe as measurement points, and five key quality characteristics at each cross-section were measured. The five quality characteristics measured include wall thickness (\(\:e\)), outer diameter (\(\:D\)), inner diameter (\(\:d\)), concentricity (\(\:\delta\:\)), and ovality (\(\epsilon\)). Among them, the wall thickness is used to evaluate the strength and durability of the pipe; the outer diameter and inner diameter affect the connection compatibility and fluid transportation capacity of the pipe, respectively; and the concentricity and ovality are closely related to the symmetry and structural integrity of the pipe.

This experiment employed an ultrasonic testing device for measurement, with an accuracy of ± 0.001 mm. The device automatically performs sampling and measurement to ensure the accuracy and stability of the data. In the experimental design, a total of 10 pipes (p = 10) were selected, three factories were chosen for production (o = 3), and each pipe was measured three times (r = 3) to obtain representative measurement data. The entire measurement process maintained consistent environmental conditions to minimize the influence of external factors on the data.

Dimensionality reduction algorithm comparison and evaluation

Inter- and Intra-Class distances

Intra and inter-class distances are important metrics for evaluating the effectiveness of dimensionality reduction. Intra-class distance (\(\:{S}_{w}\)) measures the similarity between samples within the same class, while inter-class distance (\(\:{S}_{b}\)) measures the difference between samples from different classes. By optimizing the ratio of intra-class to inter-class distances, the separability of the reduced-dimensional data can be significantly enhanced.

The formulae for calculating the intra-class distance \(\:{S}_{w}\) and inter-class distance \(\:{S}_{b}\) are:

$$\:{S}_{w}=tr\left({S}_{w}\right)={\sum\:}_{i=1}^{L}{\sum\:}_{j=1}^{{N}_{i}}|\left|{x}_{ij}-{\mu\:}_{i}\right|{|}^{2}$$
(24)
$$\:{S}_{b}=tr\left({S}_{b}\right)={\sum\:}_{i=1}^{L}{N}_{i}|\left|{\mu\:}_{i}-\mu\:\right|{|}^{2}$$
(25)

Where: \(\:{x}_{ij}\) represents the j-th sample vector of the i-th class in the feature space; \(\:{\mu\:}_{i}\) represents the mean vector of the ith class samples; and \(\:\mu\:\) is the overall mean vector of all samples.

To compare and evaluate the effects of different dimensionality reduction algorithms, this study selects four methods: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and the variable-weight UMAP (VUMAP) proposed in this paper. Each method’s intra-class distance \(\:{S}_{w}\), inter-class distance \(\:{S}_{b}\), and inter-class-to-intra-class ratio \(\:{S}_{b}/{S}_{w}\) are compared.

Appropriate parameters were configured for each dimensionality reduction algorithm to ensure a fair comparison, and the reduction was uniformly conducted to two dimensions across all experiments. The parameter configurations for each algorithm are summarized as follows: PCA is a linear dimensionality reduction method, using the default Euclidean metric; t-SNE was configured with \(\:perplexity=30,\:n\_iter=1000\), metric = “Euclidean” (default); UMAP was set with \(\:n\_neighbors=15,\:\:min\_dist=0.1\), metric = “Euclidean” (default); VUMAP was optimized using \(\:n\_neighbors=15,\:min\_dist=0.1\), metric = “Minkowski”.

Table 2 Parameter values for four standard dimensionality reduction algorithms.

As shown in Table 2, the intra-class distance\(\:\:{S}_{w}\:\)of the UMAP and VUMAP dimensionality reduction algorithms is small, indicating that similar samples are closely clustered in the reduced-dimensional space. In contrast, the inter-class distance\(\:\:{S}_{b}\:\)is considerable, indicating significant differences between samples of different categories. Thus, UMAP and VUMAP exhibit superior performance in category distinction. In particular, the inter-class-to-intra-class ratio\(\:\:{S}_{b}/{S}_{w}\:\)of VUMAP is significantly higher than that of other algorithms, highlighting its advantage in category distinction.

Visualization of the dimensionality reduction of the four dimensionality reduction algorithms

To further qualitatively evaluate the effects of dimensionality reduction, this study employs data visualization analysis, comparing the results of different algorithms using scatter plots of the reduced-dimensional data in two-dimensional space. The four algorithms are analyzed across three dimensions: the number of measurements, the measurement plant, and the measurement part number. Figure 7 illustrates the visual embedding results of the four dimensionality reduction algorithms across different dimensions. The horizontal and vertical axes indicate the projected positions of high-dimensional data points within a two-dimensional space following dimensionality reduction.

Fig. 7
figure 7

Two-Dimensional (2D) embedding visualization results of the four dimensionality reduction algorithms.(a)PCA, (b)t-SNE, (c)UMAP, (d)VUMAP.

As shown in Fig. 7, VUMAP demonstrates the best performance in visualizing the data distribution after dimensionality reduction. The embedding results cluster samples within each class more tightly and separate samples between classes more distinctly. This further validates the superiority of the proposed VUMAP algorithm in data dimensionality reduction tasks.

In summary, through quantitative analysis of intra-class and inter-class distances and qualitative evaluation of visual analysis, the proposed VUMAP algorithm demonstrates significantly greater effectiveness than other algorithms in reducing the dimensionality of high-dimensional data, particularly in feature separability, providing more valuable results for subsequent measurement system analysis.

Comparison of nonlinear structure retention capabilities

To further validate the superiority of the proposed VUMAP method over traditional PCA in preserving nonlinear structures, the classic Swiss Roll dataset is employed as a benchmark for nonlinear structure validation. A quantitative evaluation is conducted by calculating the trustworthiness and continuity29,30 metrics for both methods, which measure their ability to preserve high-dimensional adjacency relationships. Trustworthiness quantifies the extent to which high-dimensional neighbors remain close in the low-dimensional space. At the same time, Continuity assesses the degree to which low-dimensional neighbors correspond to original high-dimensional neighbors. Both metrics have been widely adopted in prior studies to assess topological preservation performance.

Table 3 Comparison of topological preservation between VUMAP and PCA.

Table 3 presents the structural preservation metrics of PCA and VUMAP on the Swiss Roll dataset, averaged over multiple experimental runs. Considering the inherent randomness in dimensionality reduction methods due to initialization and stochastic sampling, each method was executed multiple times, and the results were reported in terms of mean ± standard deviation. The results indicate that VUMAP significantly outperforms PCA across both metrics, with \(\:P\) less than 0.01, thereby confirming the superior capability of VUMAP in retaining nonlinear information.

Analysis of the VUMAP-based multivariate measurement system

Module 1 data preprocessing and correlation analysis

Step1 Data outlier detection and preprocessing

During the production of gas polyethylene pipes, an ultrasonic thickness gauge is employed to accurately measure key production parameters, including wall thickness, outer diameter, inner diameter, concentricity, and ovality. To ensure the accuracy of data analysis, outlier detection was first conducted. Potential outliers were identified using the DBSCAN clustering algorithm, and its parameters were optimized through grid search (epsilon = 0.30004, min_samples = 1) to ensure data integrity and consistency. After processing, it was confirmed that the original dataset contained no significant outliers. As shown in Table 4, the preprocessed dataset provides reliable data for subsequent analysis.

Table 4 Pretreatment dataset for the production process of gas polyethylene pipes (partial data).

Step2 Pearson correlation analysis

A Pearson correlation analysis was conducted on the preprocessed dataset, and a Pearson correlation coefficient matrix was constructed. A heatmap was then generated to explore the linear relationships between key production parameters.

Figure 8 shows the strongest positive correlation between wall thickness and ovality, with a correlation coefficient of 0.83. This suggests that an increase in wall thickness may lead to uneven deformation of the pipe cross-section, thereby increasing ovality. Additionally, wall thickness exhibits slight (− 0.27) and moderate (− 0.35) negative correlations with outer and inner diameter, respectively. This implies that, under certain production conditions, an increase in wall thickness may correspond to a reduction in outer and inner diameters. A strong positive correlation (0.80) between outer and inner diameter reflects their close physical relationship. Similarly, a moderate positive correlation (0.53) between outer diameter and concentricity suggests that changes in outer diameter may influence pipe concentricity.

Fig. 8
figure 8

Heatmap of pearson correlation coefficients for key production parameters.

Step3 Visual analysis of pairwise relationships and distributions

To better understand the interrelationships and structural distributions among key production parameters, pairwise relationships between each parameter were further analyzed and visualized. As shown in Fig. 9, the results reveal linear correlations and more complex nonlinear interactions between the parameters. For instance, the weak correlation (0.08) between outer diameter and ovality suggests no direct relationship. In contrast, the moderate negative correlation (− 0.47) between inner diameter and ovality implies that an increase in inner diameter may reduce ovality. This analysis highlights the complexity of interactions in the production process, indicating that analyzing parameters individually may fail to capture their intrinsic relationships fully.

Fig. 9
figure 9

Pairwise relationship diagram of key production parameters.

The above analysis demonstrates that the key parameters in the production process of gas polyethylene pipes exhibit complex interactions. Traditional univariate measurement system analysis methods are insufficient to fully reveal the measurement system’s true capability. Therefore, the VUMAP-based multivariate measurement system analysis model proposed in this study is employed to evaluate the system’s performance more accurately.

Module 2 VUMAP dimensionality reduction based on dynamic weight assignment

Building on the previous overview and process analysis of the VUMAP dimensionality reduction algorithm, this section employs Python to reduce the dimensionality of the preprocessed dataset after outlier detection. The VUMAP algorithm was applied to generate data in two dimensions, and one dimension was used to support both qualitative visualization and quantitative system evaluation. The results of the dimensionality reduction are presented in Tables 5 and 6, respectively.

Table 5 One-Dimensional data (partial) after VUMAP-based dimensionality reduction.
Table 6 Two-Dimensional data (partial) after VUMAP-based dimensionality reduction.

Module 3 multivariate measurement system competency evaluation

Step1 Data visualization

To evaluate the capabilities of the multivariate measurement system visually, the reduced-dimensional data is first classified and analyzed using a scatter plot. Specifically, the data is classified based on three key labels: the number of measurements, the measurement part number, and the measurement plant. The results are visualized in the scatter plot shown in Fig. 10. The horizontal and vertical axes indicate the projected positions of high-dimensional data points within a two-dimensional space following dimensionality reduction.

Fig. 10
figure 10

Scatter plot based on VUMAP dimensionality reduction embedding.

Combined with the data distribution after VUMAP-based dimensionality reduction shown in Fig. 10, the following observations can be made: In the graph labeled by the number of measurements, the closely distributed point clusters indicate high consistency in repeated measurements over time. In the graph labeled by the measurement factory, the lack of significant differences in the distribution of point clusters across factories suggests good reproducibility of the measurement system. In the graph labeled by the measured part, the clear separation of point clusters for different parts demonstrates the measurement system’s strong ability to distinguish between parts, with variation primarily stemming from part differences rather than random measurement errors.

Based on the scatter plot analysis in Fig. 10, the high-dimensional data is further reduced to one dimension, and the reduced data is analyzed using the R&R method, as illustrated in Fig. 11.

Fig. 11
figure 11

Results of the R&R analysis for the measurement system’s gauges.

The following conclusions can be derived from Fig. 11:

  1. (1)

    Variance Component Plot: The variance between parts is significantly higher than in repeatability and reproducibility, indicating that the primary source of variation in the measurement system stems from part differences. This further demonstrates the system’s strong discrimination ability.

  2. (2)

    R Control Chart: Most measurement points fall within the control limits, indicating that error fluctuations in the measurement process are well-controlled across factories. This demonstrates the system’s high consistency.

  3. (3)

    Xbar Control Chart: Most points lie outside the control limits, indicating that the sample selection effectively captures the variability between parts, with significant differences in average values.

  4. (4)

    Part Measurement Value Graph: The minor differences between multiple measurements of the same part confirm the system’s stability and consistency in repeated measurements.

  5. (5)

    Operating Plant Measurement Value Diagram: The near-parallel average values across measurement plants indicate the system’s reproducibility and consistent measurement results across different plants.

  6. (6)

    Interaction Diagram: The near-overlapping measurement lines of the operating factories, with no significant crossings, indicate no significant interaction between factories and parts. The measurement error primarily arises from part differences.

Step2 Evaluation of measurement system capabilities

Based on the data visualization and analysis, the second step involves evaluating the measurement system capability of the reduced-dimensional dataset. For the high-dimensional quality characteristic data of the gas-polyethylene pipeline production process, this study compares the measurement system performance before and after dimensionality reduction, validating the effectiveness of the multivariate measurement system analysis model based on VUMAP. In this study, the original high-dimensional dataset and its reduced-dimensional counterpart are consistently analyzed using MINITAB, where the measurement system variance is computed alongside other key statistical parameters. The evaluation indices %R&R and NDC for each measurement system were calculated using Eqs. (22)-(23), with the specific results presented in Table 7.

Table 7 Evaluation of measurement system capability for original and reduced data.

As shown in Table 7, the %R&R and NDC values in the original high-dimensional dataset exhibit significant variation across different quality characteristics. For instance, the %R&R reaches 29.49%, nearing the upper limit, while the NDC is only 4, indicating significant measurement system variation and weak discrimination ability for specific characteristics. After dimensionality reduction to one dimension, the %R&R drops significantly to 1.94%, and the NDC increases to 72, demonstrating the measurement system’s enhanced efficiency. Moreover, the correlation variation of the measurement system post-reduction (\(\:{\widehat{\sigma\:}}_{MS}\) = 0.3463) is significantly lower than the variation between parts (\(\:{\widehat{\sigma\:}}_{P\:}\) = 17.8711), indicating effective control of measurement errors and retention of inter-part variation information. The VUMAP-based dimensionality reduction method enhances the measurement system’s repeatability, reproducibility, and discrimination ability while reducing the source of system variation.

Robustness analysis

Since a novel multivariate measurement system evaluation method based on VUMAP was proposed, a hierarchical bootstrap analysis31 was conducted to assess the robustness and consistency of its evaluation results. A nested structure of “parts–measurers–repeated measurements” was adopted, and 30 bootstrap iterations were performed. The %R&R and NDC indicators were calculated separately, and the results were reported as mean ± standard deviation (SD). Statistical significance was assessed via independent-samples t-tests.

Table 8 Bootstrap-based robustness evaluation of %R&R and NDC.

As shown in Table 8, the %R&R and NDC indicators obtained by the VUMAP method exhibit minimal variation across multiple resampling iterations, with consistently low standard deviations, demonstrating high stability and consistency. Furthermore, the VUMAP method exhibits statistically significant differences from traditional dimensionality reduction methods, indicating that the measurement capability assessment results are statistically robust and demonstrate strong repeatability and generalizability.

Conclusions and discussion

Conclusions

Given the challenges in comprehensively evaluating the measurement system performance for the five key quality characteristics of gas polyethylene pipe production—wall thickness, inner diameter, outer diameter, concentricity, and ovality—due to their complex interactions, this study proposes a multivariate measurement system analysis model based on VUMAP dimensionality reduction. This model innovatively integrates dynamic weighting, nonlinear information retention, and multi-module system analysis methods. The model comprises three modules: data preprocessing and correlation analysis, VUMAP dimensionality reduction with dynamic weighting, and multi-measurement system capability evaluation, offering a novel approach to quality management in complex production scenarios.

First, the original data undergoes preprocessing, and correlation analysis is conducted to identify significant interactions among quality characteristics. Second, based on the correlation analysis results, the decision is to employ either a univariate or multivariate measurement system analysis method. Then, the VUMAP algorithm with dynamic weighting is applied for dimensionality reduction, preserving key data characteristics to the greatest extent and generating a low-dimensional dataset. Finally, the low-dimensional dataset is comprehensively analyzed by integrating data visualization with quantitative evaluation. The model’s superiority is validated using a dataset from gas-polyethylene pipe production: after dimensionality reduction, the %R&R of the measurement system decreased significantly (from 29.49% to 1.94%), while the NDC increased markedly (from 4 to 72), demonstrating substantial improvement in measurement system performance. Additionally, visual analysis reveals a clear separation of point clouds across different parts and minimal variation among point clouds from different factories, further confirming the measurement system’s distinguishing ability and consistency. The results demonstrate that the VUMAP-based multivariate measurement system analysis model effectively controls measurement errors, retains part difference information, and excels in handling complex nonlinear data, fully addressing the measurement needs of production applications.

Limitations and application prospects

Despite its effectiveness, the proposed method still has several limitations. First, the efficiency of the dynamic weighting mechanism may degrade when applied to large-scale datasets. Second, quantitatively modeling local quality characteristics’ contributions to overall measurement system errors remains challenging. Third, since the weight assignment is derived from unsupervised clustering of pairwise sample distances followed by optimization, it is difficult to establish an explicit correspondence between the assigned weights and the physical significance of individual quality characteristics, which limits the interpretability of the results. Future research should therefore focus on enhancing the algorithm’s scalability for industrial-scale data, incorporating causal inference and deep learning techniques to identify error sources better, and introducing domain expertise or physical constraints to improve the model’s engineering interpretability and practical applicability.

We plan to develop a National Quality Infrastructure (NQI) demonstration platform for polyethylene gas pipelines in future work. We will embed the proposed model into the quality control panel as the core algorithm for multivariate quality monitoring during production. The model requires no additional hardware and can seamlessly integrate into existing manufacturing systems. It is expected to reduce manual inspection workload, improve the efficiency of multivariate anomaly detection, and enhance product yield through a timely feedback mechanism.