Introduction

Metrology is closely related to people’s livelihood, and metrology science is widely used in food safety, national defense construction, trade settlement, medical and health care and other fields. Compared with the western developed countries, China’s modern measurement system related theoretical research started late, the Metrology Law was officially implemented from 1985. Before the domestic measurement calibration and testing market was opened, China’s metrological administrative departments at all levels of government and their technical institutions, that is, statutory metrological verification institutions, undertook other non-mandatory verification work while carrying out national compulsory verification work, and provided metrological calibration services to the public in order to guarantee the unity of quantity values in the country. With the progress of The Times and the development of modern science and technology, under the new situation of the initial establishment of the socialist market economic system, the measurement system at that time has shown many drawbacks, and the lagging development pace makes the original purpose and practicability of measurement calibration work cannot be guaranteed, and cannot fully meet the requirements of current economic and social development1. From the “Several Opinions of The State Council on Accelerating the Development of Science and Technology Service Industry” issued by The State Council in 2014, it is proposed to “accelerate the development of third-party inspection, testing and certification services, and encourage different ownership inspection, testing and certification institutions to participate in market competition on an equal basis.” Support qualified inspection, testing and certification bodies and administrative departments to decoupage, transform into enterprises, accelerate cross-department, cross-industry, cross-level integration and mergers and acquisitions, and cultivate many inspections, testing and certification groups with strong technical capabilities, high service level and good scale efficiency. Efforts should be made to improve the planning and layout of inspection, testing and certification bodies, and strengthen the construction of national quality inspection centers and testing laboratories. Build an industrial measurement and testing service system, strengthen the construction of a national industrial measurement and testing center, and establish a metrological science and technology innovation alliance. By 2015, the CNCA also clearly proposed in the “13th Five-Year Plan” that “by 2020, the testing and certification market should further develop in the direction of industrialization, scale and specialization, guide practitioners to improve the mechanism and moderate concentration, and encourage government departments at all levels in the process of performing management functions and providing public services.” Purchase testing and certification services in a market-oriented manner “. In 2018, after the fifth revision of the Metrology Law of the People’s Republic of China, it has more clearly defined the marketization of measurement, testing and calibration business, and encouraged more qualified testing institutions to participate in market competition. At the same time, the concept of industrial metrology is proposed for the first time, which puts forward higher requirements for the organization, research and development ability and professional quality of legal metrology verification institutions. In the 2020 Proposal of the Central Committee of the Communist Party of China on Formulating the 14th Five-Year Plan for National Economic and Social Development and the Vision Goals of 2035, it is proposed to “improve the national quality infrastructure, strengthen the standards, metrology, patent and other systems and capacity building, and carry out in-depth quality improvement actions”, which further supplements the strategy of achieving “quality power”. No matter from the policy aspect, or from the technical ability, personnel allocation, comprehensive services, etc., the measurement management system is gradually moving towards market-oriented reform to cope with the growing measurement demand and fierce market competition2.

In the field of healthcare, the metrology of medical equipment, as an essential component and safeguard of quality control for medical devices, is increasingly valued by medical institutions at all levels3,4,5,6,7. Currently, China’s metrology management system includes provincial, municipal, and county-level legal metrology technical institutions, which together form the main implementing bodies of metrology legal management. National, provincial, and most prefecture-level legal metrology technical institutions, along with some specialized and industry-specific institutions, such as certain metrology institutes within the military, are the primary bearers of metrology technical management. Legal metrology technical institutions capable of providing business services and inspection and testing institutions in the market jointly offer technical services for metrology economic management. At present, medical institutions in China generally entrust legal metrology verification institutions or qualified non-legal metrology verification institutions to carry out metrology work for medical equipment. In the process of conducting metrology work for medical equipment, different medical institutions face different issues. For example, large tertiary general hospitals have high demands for equipment metrology, large annual metrology expenditures, and high time costs. Non-provincial capital large hospitals have limited choices for entrusted verification institutions. Secondary or primary medical institutions often have untimely responses to metrology needs, and so on. These existing issues have led many medical institutions to consider establishing internal metrology standards to create a convenient, long-term stable, and low-cost metrology service model. However, further in-depth exploration and the construction of a comprehensive feasibility evaluation system for the establishment of metrology standards have not been carried out. Therefore, establishing metrology standards is an innovative task for medical institutions, and they can refer to the feasibility evaluation systems established in the domestic and international medical and health fields. Madan, J, et al., constructed a Structured economic assessment system for health system interventions in 20208. Odhiambo-Otieno, GW evaluated the feasibility and sustainability of DHMIS in the management of regional health systems (DHS) in Kenya through design evaluation criteria in 20059. In 2023, Lin, F.Q et al. built a health status classification evaluation system for hydraulic systems with variable operating conditions based on parameter identification10. Tian, B.Q et al., based on the combination of qualitative and quantitative methods, established a set of reasonable and simple performance evaluation index system to evaluate the function and performance of community health services in 201311. This paper aims to construct a feasibility evaluation system for establishing metrological standards, providing scientific and fair reference criteria for whether different types of medical institutions need to establish metrological standards, which requires full consideration of the opinions of relevant personnel, including the heads of medical institutions, medical equipment users, and those responsible for the assessment of measurement standards. Moreover, it is necessary to consider a variety of factors such as economic conditions, the characteristics of medical institutions, and technical capabilities. This is a multi-objective decision-making problem. Common methods for addressing such issues include the Analytic Hierarchy Process (AHP), decision tree algorithms, swarm intelligence optimization algorithms, and the Network Analytic Hierarchy Process (NAHP), among others. Yu, C.S introduced the application of GP-AHP method to the ptwo comparison problem of triangular, concave, and convex mixed fuzzy estimation in group decision making environment in article A GP-AHP method for solving group decision-making fuzzy AHP problem12. Zhang, L.X’ s article, Rockfall hazard assessment of the slope of Mogao Grottoes, China based on AHP, F-AHP and AHP-TOPSIS, introduced the application of AHP, F-AHP and AHP-TOPSIS in the rock fall risk assessment of Mogao Grotto slope13. Guo, D introduced the application of decision tree algorithm in lumber hierarchies in article Application of Decision Tree Algorithm in Lumber Hierarchies14. Qiao, J.W. et al. introduced the application of hybrid particle swarm optimization algorithm for solving engineering problem in the paper A hybrid particle swarm optimization algorithm for solving engineering problem15. Tang, G et al. introduced the application of particle swarm optimization algorithm in the paper Control system research in wave compensation based on particle swarm optimization16.

Materials and methods

The Analytic Hierarchy Process refers to a systematic approach that treats a complex multi-objective decision-making problem as a system. It involves breaking down the overall goal into multiple sub-goals or criteria, which are further decomposed into several levels of indicators17,18,19,20,21,22. Through a method of qualitative indicator fuzzification and quantification, it calculates the priority rankings at each level and the overall ranking, serving as a systematic method for optimizing decisions involving multiple indicators and alternatives23,24. The Group Decision Making-Analytic Hierarchy Process (GDM-AHP) is an enhanced method based on improvements to the AHP and modern decision theories. It integrates the principles of system clustering analysis and combines qualitative and quantitative methods. This comprehensive analytical approach not only fully reflects the decision-makers’ intentions but also maximizes the elimination of various subjective factors25,26,27,28,29,30,31,32. The decision-making problem regarding the establishment of measurement standards in medical institutions in this study is precisely the type of group decision-making problem that GDM-AHP can address. Therefore, this paper applies GDM-AHP to construct a feasibility evaluation system for the establishment of measurement standards, thereby providing a scientific and fair reference basis for different types of medical institutions to determine whether they need to establish measurement standards. The technical roadmap for constructing a model based on GDM-AHP is shown in Fig. 1.

Fig. 1
figure 1

The technical roadmap for constructing a model based on GDM-AHP.

Establishment of the GDM-AHP model

Compare the information related to medical equipment with the criteria-level indicators used for evaluation to obtain the corresponding score values, thereby establishing a GDM-AHP model. The specific steps are as follows33,34,35:

  1. 1.

    Determine the main criteria-level and sub-criteria-level indicators for the establishment of measurement standards for medical equipment.

  2. 2.

    Calculate the relative weights of each indicator determined in step 1.

  3. 3.

    Define descriptive levels for each indicator in the sub-criteria layer and calculate the relative intensity of different levels.

  4. 4.

    Set up and calculate the target layer.

  5. 5.

    Conduct sensitivity analysis on the constructed GDM-AHP model.

Determination of criteria layer indicators

The assessment content for establishing new measurement standards includes six aspects: metrological standards and supporting equipment, main metrological characteristics of the standards, environmental conditions and facilities, personnel, document sets, and measurement capabilities of the standards. And all equipment used must be calibrated in accordance with ISO/IEC 17,025 standards. For enterprises and institutions, establishing measurement standards should be based on actual needs, scientific decision-making, and efficiency to reduce the arbitrariness of the standards. The primary task of medical institutions is to provide medical services to society. Establishing measurement standards is intended to ensure the quality control of medical equipment, rather than to make a profit by providing measurement services to the public. At present, China has not established a system for evaluating the feasibility of the establishment of measurement standards. This paper uses relevant research results and practical experience at home and abroad to determine the evaluation indicators that should be considered in the decision of the establishment of medical equipment measurement standards36,37,38,39,40,41,42.

Calculation of the relative weight of each index in the criterion layer

Five experts were selected who were familiar with the field of medical equipment measurement standards establishment, of which 2 were experts with more than 10a measurement standards evaluation work, 2 were experts from provincial metrology research institutes with more than 15a new measurement standards related work experience, and 1 was an expert with more than 10a medical equipment measurement management work. A questionnaire was sent to these 5 experts, and each expert conducted pairwise comparison of the evaluation indicators of the criterion layer. Saaty scale method (1 means not important, 9 means very important) was used to construct the judgment matrix \(\:{A}^{k}\)(k = 1, 2, …, 5)of the criterion layer43. The judgment matrices have the following properties: \(\:{a}_{ij}=\frac{1}{{a}_{ji}}\), and \(\:{a}_{ii}=1\). The comprehensive judgment matrix \(\:\bar{A}\) is obtained by clustering the five judgment matrices using the weighted geometric mean method:

$$\:{\bar{A}}_{ij}=\prod\:_{k=1}^{5}\:{A}_{ij}^{k{\beta\:}_{k}}$$
(1)

where \(\:{\beta\:}_{k}\)​ represents the weight coefficient for each expert, and in this study, all five experts have equal weight coefficients \(\:({\beta\:}_{1}={\beta\:}_{2}={\beta\:}_{3}={\beta\:}_{4}={\beta\:}_{5}=0.2).\)

The eigenvector w corresponding to the largest eigenroot \(\:{\lambda\:}_{max}\) of the judgment matrix \(\:\bar{A}\) is solved, and the relative weights of each index in the criterion layer can be obtained after normalization.

Usually, the expert assessment of each evaluation index is different, therefore, if all the expert’s estimate is consistent, then the data obtained in the assessment can be used to study. In this paper, Kendall consistency coefficient is used to evaluate and determine the consistency of expert opinions, and W is used to represent the value of the consistency coefficient. The closer the value is to 1, the higher the consistency is. When there are k experts to score n evaluation indicators, the scoring results can be grouped into matrix \(\:E=\Vert{e}_{ij}\Vert\)(i = 1, 2,, k, j = 1, 2,, n), and \(\:{e}_{ij}\) represents the score of the i-th expert on the j-th evaluation index. Let \(\:\bar{e}=\frac{{\sum\:}_{i=1}^{k}\sum\:_{j=1}^{n}{e}_{ij}}{n}\), the consistency coefficient W can be calculated as follows:

$$\:W=\frac{12*S}{{k}^{2}({n}^{3}-n)}\:$$
(2)

In the formula (2), the calculation formula of S is as follows:

$$\:S=\sum\:_{j=1}^{n}{\left(\sum\:_{i=1}^{k}{e}_{ij}-\bar{e}\right)}^{2}\:$$
(3)

When the number of evaluation indicators n ≥ 5, the significance of the concordance.

coefficient could be determined, using χ2 criteria, and the calculation formula χ2 is as follows:

$$\:{\chi\:}^{2}=W*k*(n-1)$$
(4)

The random value is distributed according to the distribution χ2 with the degree of freedom ν = n − 1. According to the chosen significance level α (which is usually equal to 0.05 or 0.01), we can find the critical value \(\:{\chi\:}_{kr}^{2}\) from the table of χ2 distribution with the degree of freedom ν = n − 1. If the value of χ2 obtained by formula (4) is larger than \(\:{\chi\:}_{kr}^{2}\), the estimates of the experts are consistent.

Classification of description grades and calculation of their relative strength

In this paper, the classification of each evaluation index level was discussed by the expert members. In order to avoid subjectivity, the two levels were compared, and then the normalization method was adopted to classify each evaluation index level, to determine the relative strength between levels. Take \(\:{w}_{i}\) as the relative weight of each evaluation index grade, then its relative intensity \(\:{s}_{i}\) can be calculated as follows:

$$\:{s}_{i}=\frac{{w}_{i}}{\text{m}\text{a}\text{x}\left({w}_{i}\right)}$$
(5)

Calculation of feasibility indicator for Establishing metrological standards

To apply the criteria-level indicators to the decision-making process for establishing measurement standards for medical equipment, it is necessary to determine a priority score for the establishment of measurement standards for the medical equipment in the model. This score quantitatively measures the feasibility of establishing measurement standards for this category of medical equipment. Define R as the indicator for measuring the feasibility of establishing measurement standards, where the magnitude of R is directly proportional to the feasibility of establishing measurement standards. Let \(\:{\text{w}}_{\text{i}}\) represent the relative weight of the main criteria-level evaluation indicators (where i = 1, 2,, n, and n represents the number of main criteria-level evaluation indicators in the model), and\(\:{\text{w}}_{\text{j}\_\text{i}}\) represent the relative weight of the sub-criteria-level evaluation indicators (where j = 1, 2,, m, and m represents the number of sub-criteria-level evaluation indicators corresponding to the i-th main criteria-level evaluation indicator in the model). Additionally, each sub-criteria-level in the model corresponds to a unique descriptive level, and its relative intensity is denoted as \(\:{\text{s}}_{\text{k}\_\text{j}\_\text{i}}\) (where k = 1, 2,, q, and q represents the number of descriptive levels corresponding to the j-th sub-criteria-level evaluation indicator of the i-th main criteria-level evaluation indicator in the model). The formula for calculating the feasibility indicator R for the establishment of measurement standards for medical equipment is as follows:

$$\:R=\sum\:_{i=1}^{n}\sum\:_{j=1}^{m}{w}_{i}{w}_{j\_i}{s}_{k\_j\_i}$$
(6)

Considering the relative weights of all evaluation indicators, the feasibility indicator R for the establishment of measurement standards for medical equipment is divided into intervals as follows: when R ≥ 0.5, the equipment has a high priority for establishing measurement standards; when 0.4 ≤ R < 0.5, the equipment has a medium priority for establishing measurement standards; when R < 0.4, the equipment has a low priority for establishing measurement standards.

Sensitivity analysis of GDM-AHP model

In the constructed GDM-AHP model, different evaluation indicators have varying degrees of influence on the results due to their weights. Therefore, it is necessary to conduct a sensitivity analysis of the weights of each evaluation indicator in the model. This analysis determines the impact of changes in the weights of the evaluation indicators on the feasibility indicator R for the establishment of measurement standards for medical equipment. By analyzing the sensitivity of the evaluation indicators, initial weights can be adjusted to enhance the stability and reliability of the constructed model44,45,46.

In this paper, the Perturbation Method is employed to conduct sensitivity analysis on the main criteria-level and sub-criteria-level evaluation indicators in the constructed GDM-AHP model47,48,49. The initial weights of the evaluation indicators are obtained from the relative weights acquired through the steps mentioned above. The sensitivity analysis of the main criteria-level evaluation indicators is conducted under the condition that the internal initial weights of the sub-criteria-level evaluation indicators remain unchanged, considering the impact of changes in the weights of the main criteria-level evaluation indicators on the feasibility indicator R for the establishment of measurement standards for medical equipment. The sensitivity analysis of the sub-criteria-level evaluation indicators is conducted while keeping the initial weights of the main criteria-level evaluation indicators constant, considering the impact on the feasibility indicator R when the internal weight distribution of the sub-criteria-level evaluation indicators varies. Based on the results of the sensitivity analysis of the evaluation indicators, the initial relative weights of each evaluation indicator are adjusted to obtain reasonable weight values for each evaluation indicator in the GDM-AHP model.

Results

Determination of criteria layer indicators in the constructed GDM-AHP model

Medical institution perspective

The nature and scale of medical institutions directly determine the talent and technical strength of hospitals, the conditions of medical hardware equipment, and the total number of devices requiring metrological calibration. These are important indicators for evaluating the feasibility of establishing metrological standards, and should specifically include the following aspects.

  1. (1)

    Scale of medical institutions: According to the medical service capabilities and facility conditions, medical institutions in China are categorized into three levels: primary, secondary, and tertiary. The higher the level of the medical institution, the greater the requirements for medical equipment management and the workload for metrological calibration50,51.

  2. (2)

    Allocation of relevant technical personnel: According to the “Metrological Standards Assessment Measures,” engineers need relevant qualifications to conduct metrological service work. The allocation of engineers in medical institutions is also an indicator to consider in establishing metrological standards40.

  3. (3)

    Standard instrument configuration: The configuration of standard instruments is an essential condition for establishing metrological standards. Medical equipment that already has standard instruments has an advantage in establishing metrological standards.

  4. (4)

    Willingness to establish metrological standards: The willingness of the medical equipment management department leaders and senior hospital leaders to establish metrological standards will directly determine the feasibility of the entire work.

Medical equipment perspective

According to different classification standards, medical equipment can be categorized into mandatory calibration equipment and non-mandatory calibration equipment, as well as high-risk, medium-risk, and low-risk equipment52,53. Different equipment attributes affect the feasibility of establishing metrological standards.

  1. (1)

    Mandatory calibration equipment: In medical institutions, medical equipment includes both mandatory calibration and non-mandatory calibration equipment. According to the “Metrology Law,” for mandatory calibration instruments, the using units must apply for calibration to designated metrological verification institutions as required; for non-mandatory calibration instruments, using units may choose to calibrate regularly according to actual needs or send them to other metrological verification institutions. Therefore, it is necessary for medical institutions to establish metrological standards for non-mandatory calibration equipment.

  2. (2)

    Metrological demand: Depending on the use and risk level of medical equipment, the management and calibration requirements vary. Devices such as life support equipment and those with higher risk levels have greater metrological demands, making the establishment of metrological standards more necessary.

  3. (3)

    Cost of individual calibration: High economic expenditures on metrological work are a common issue in current metrological verification work. The cost of equipment calibration is directly proportional to the necessity of establishing metrological standards.

  4. (4)

    Current metrological technical regulations: For equipment covered by existing metrological technical regulations, the difficulty of establishing metrological standards for such equipment is lower.

Technical conditions perspective

In the assessment of establishing new metrological standards, the implementation environment of metrological standards and the confirmation of metrological measurement capabilities are key evaluation criteria. Therefore, the technical conditions perspective in this system includes two indicators: metrological calibration operation techniques and conditions for metrological standard preservation and maintenance54:

(1). Metrological operation techniques: Proficiency in metrological calibration operation techniques is fundamental to conducting metrological calibration work.

(2). Conditions for metrological standard preservation and maintenance: This includes environmental conditions for storing standard instruments and supporting facilities, as well as the traceability management of metrological standards.

Economic perspective

Economic cost is one of the main indicators of the feasibility of establishing metrological standards. This paper uses the payback period method to assess the economic benefits of establishing metrological standards for a certain type of medical equipment: Payback Period = Total Investment / Annual Expenditure on Metrological Calibration for such equipment55,56. Total investment refers to the cost of establishing metrological standards, including the cost of metrological standard devices, supporting equipment, costs for training technical personnel, etc. Annual expenditure on metrological calibration for such equipment refers to the current annual expenditure on metrological work for these types of equipment. A shorter payback period indicates lower risk and better economic benefits.

Outsourced verification agency perspective

When evaluating the feasibility of establishing metrological standards, the current situation of outsourced verification agencies is also an important consideration. The choice of outsourced verification agency for most medical institutions depends on factors such as institutional preferences, economic costs, etc. However, for smaller-scale institutions or those located in remote areas, there may be limitations or even exclusivity in choosing a verification agency. Additionally, there may be issues such as slow response to metrological demands and low efficiency in carrying out verification work.

  1. (1)

    Attributes of outsourced verification agencies: Currently, verification agencies include statutory and non-statutory verification agencies. Institutions with limited choices of verification agencies tend to have a stronger inclination towards establishing metrological standards.

  2. (2)

    Response to metrological demand: The urgency of establishing metrological standards is inversely proportional to the speed at which verification agencies respond to metrological demands.

  3. (3)

    Efficiency in carrying out metrological work: The urgency of establishing metrological standards is inversely proportional to the efficiency with which verification agencies carry out metrological work.

The initial weights of evaluation indexes in the constructed GDM-AHP model

According to the determination of the above evaluation indicators, in the formula (3), n = 5, and the values of i correspond to m as shown in Table 1.

Table 1 Correspondence between values of i and m.

The judgment matrices A1、A2、 A3、 A4、 A5 made by the five experts on the main criterion layer in the construction of GDM-AHP model are as follows:

$$\:{A}^{1}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:1/3&\:3\\\:3&\:1&\:5\\\:1/3&\:1/5&\:1\end{array}&\:\begin{array}{cc}1/3&\:3\\\:1/3&\:3\\\:1/5&\:1/3\end{array}\\\:\begin{array}{ccc}3&\:3&\:5\\\:1/3&\:1/3&\:3\end{array}&\:\begin{array}{cc}1&\:3\\\:1/3&\:1\end{array}\end{array}\right]$$
$$\:{A}^{2}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:1&\:3\\\:1&\:1&\:5\\\:1/3&\:1/5&\:1\end{array}&\:\begin{array}{cc}1/5&\:1/3\\\:1/3&\:5\\\:1/5&\:1/3\end{array}\\\:\begin{array}{ccc}5&\:3&\:5\\\:3&\:1/5&\:3\end{array}&\:\begin{array}{cc}1&\:5\\\:1/5&\:1\end{array}\end{array}\right]$$
$$\:{A}^{3}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:1/3&\:3\\\:3&\:1&\:5\\\:1/3&\:1/5&\:1\end{array}&\:\begin{array}{cc}1/5&\:3\\\:1&\:5\\\:1/5&\:1\end{array}\\\:\begin{array}{ccc}5&\:1&\:5\\\:1/3&\:1/5&\:1\end{array}&\:\begin{array}{cc}1&\:5\\\:1/5&\:1\end{array}\end{array}\right]$$
$$\:{A}^{4}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:1/5&\:5\\\:5&\:1&\:5\\\:1/5&\:1/5&\:1\end{array}&\:\begin{array}{cc}1/3&\:5\\\:1&\:7\\\:1/5&\:1\end{array}\\\:\begin{array}{ccc}3&\:1&\:5\\\:1/5&\:1/7&\:1\end{array}&\:\begin{array}{cc}1&\:5\\\:1/5&\:1\end{array}\end{array}\right]$$
$$\:{A}^{5}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:1/3&\:5\\\:3&\:1&\:5\\\:1/5&\:1/5&\:1\end{array}&\:\begin{array}{cc}1/5&\:3\\\:1/3&\:5\\\:1/5&\:1/3\end{array}\\\:\begin{array}{ccc}5&\:3&\:5\\\:1/3&\:1/5&\:3\end{array}&\:\begin{array}{cc}1&\:5\\\:1/5&\:1\end{array}\end{array}\right]$$

The feature vector \(\:{w}_{k}\)(k = 1, 2,, 5)corresponding to the largest feature root of \(\:{A}^{k}\)(k = 1, 2,, 5) is calculated respectively, and the elements in \(\:{w}_{k}\) represent the score of the k-th expert on the evaluation index. The score \(\:{w}_{k}\) of each expert is converted into rank order respectively, and then a 5*5 matrix E is formed.

$$\:E=\left[\begin{array}{cc}\begin{array}{ccc}3&\:2&\:5\\\:4&\:2&\:5\\\:3&\:2&\:4.5\end{array}&\:\begin{array}{cc}1&\:4\\\:1&\:3\\\:1&\:4.5\end{array}\\\:\begin{array}{ccc}3&\:1&\:4\\\:3&\:2&\:5\end{array}&\:\begin{array}{cc}2&\:5\\\:1&\:4\end{array}\end{array}\right]$$

The concordance coefficient W = 0.8909, and χ2 = 17.818 were calculated by the formulas (2)-(4), while the critical value \(\:{\chi\:}_{kr}^{2}\), taken from the distribution table with the degree of freedom ν = 5 − 1 = 4 and the significance level α = 0.05, was equal to 9.488. The obtained χ2 value is considerably larger than the critical value, therefore, the estimates of experts are considered to be consistent.

According to formula (1), the comprehensive judgment matrix is obtained as follows:

$$\:\bar{A}=\left[\begin{array}{cc}\begin{array}{ccc}1&\:0.3749&\:3.6801\\\:2.6673&\:1&\:5\\\:0.2717&\:0.2000&\:1\end{array}&\:\begin{array}{cc}0.2453&\:2.1411\\\:0.5173&\:4.8287\\\:0.2000&\:0.5173\end{array}\\\:\begin{array}{ccc}4.0760&\:1.9332&\:5\\\:0.4670&\:0.2071&\:1.9332\end{array}&\:\begin{array}{cc}1&\:4.5144\\\:0.2215&\:1\end{array}\end{array}\right]$$

The maximum eigenvalue of the comprehensive judgment matrix \(\:\bar{A}\) is \(\:{\lambda\:}_{max}=5.1562\), and the consistency of the judgment matrix is verified through the Random Consistency Ratio (CR)57,58,59,60, which is calculated using the following formula:

$$\:CR=\frac{CI}{RI}$$
(7)

In the formula, the Consistency Index (CI) is calculated as \(\:CI=\frac{{\lambda}_{max}-n}{n-1}\), and when the order of the matrix n = 5, the Random Consistency Index (RI) is found to be 1.12 by consulting the relevant tables.

According to formula (7), CR = 0.03487 < 0.1 is obtained, so the judgment matrix has satisfactory consistency, and the eigenvector corresponding to \(\:{\lambda\:}_{max}\) is the relative weight coefficient of each index in the main criterion layer after normalization. Similarly, the relevant weights of each index and the relative weights of each description level in the sub-criteria layer can be calculated, and then the relative strength of each description level can be calculated through formula (5), as shown in Table 2.

Table 2 Relative weight of criterion level index and relative strength of description level relative.

Sensitivity analysis results of GDM-AHP model

With the situation of 8 types of common medical equipment in a Grade 3 A general hospital as the original data, the measurement standards of 8 types of equipment were calculated by using the constructed GDM-AHP model to establish the feasibility measurement index R value and the ratio of R value in different value intervals, and the sensitivity analysis was carried out on the weight of each index from two aspects of the evaluation index of the sub-criterion layer and the evaluation index of the main criterion layer. The initial weights of each evaluation index are shown in Table 2.

Results of sensitivity analysis of sub-criterion level evaluation index

The model constructed in this paper includes 5 main criteria-level evaluation indicators, thus requiring a sensitivity analysis for the 14 sub-criteria-level evaluation indicators under each of the 5 main criteria levels. The following text takes the sensitivity analysis of the weights of 4 sub-criteria-level evaluation indicators at the medical institution level as an example.

When perturbing the weights of related technical personnel configuration, standard instrument configuration, willingness to establish measurement standards, and the scale level of medical institutions, the weights of the 5 main criteria-level indicators and the remaining 10 sub-criteria-level evaluation indicators are as shown in Table 2. The weight of the evaluation indicator with the smallest initial weight, the scale level of medical institutions, is fixed and unchanged. The weights of related technical personnel configuration and standard instrument configuration are perturbed by 5% of their initial weights, with a perturbation range from − 20 to 20% of the initial weights. The weight of the willingness to establish measurement standards is calculated based on the total weight value of the 4 evaluation indicators being 1. The distribution of the feasibility indicator R for the establishment of measurement standards for medical equipment, calculated from 81 different perturbation amounts, is shown in Fig. 2. When the weights of related technical personnel configuration, standard instrument configuration, and willingness to establish measurement standards change, the proportion of the R ≥ 0.5 interval remains unchanged. When the perturbation amplitude of the weight of related technical personnel configuration exceeds − 15% of the initial weight, regardless of how the weights of standard instrument configuration and willingness to establish measurement standards change, the proportion of R in each interval changes little. When the perturbation amplitude of the weight of related technical personnel configuration is at -20% of the initial weight, and the perturbation amplitude of the weight of standard instrument configuration is at -5% of the initial weight (number 4), the proportions of the intervals 0.4 ≤ R < 0.5 and R < 0.4 change significantly, decreasing by 12.5% and increasing by 12.5%, respectively. It can be seen that the weight of related technical personnel configuration should be between 80% and 85% of the initial weight, and the weight of standard instrument configuration should be between 90% and 95% of the initial weight; otherwise, their impact will not be reflected.

Fig. 2
figure 2

Change in the proportion of the feasibility evaluation index R value range for the establishment of medical equipment metrological standards when the weight of relevant technical personnel configuration indicator increases.

The sensitivity analysis of the weights of the 4 sub-criteria-level evaluation indicators at the medical equipment level, the 2 sub-criteria-level evaluation indicators at the technical condition level, and the 3 sub-criteria-level evaluation indicators at the entrusted verification institution level all follow the process of the sensitivity analysis of the weights of the 4 sub-criteria-level evaluation indicators at the medical institution level mentioned above. The distribution of the feasibility indicator R for the establishment of measurement standards for medical equipment, calculated from different perturbation amounts, is shown in Figs. 3, 4 and 5, respectively. The analysis concludes that the weight of whether it is a mandatory verification device should be between 80% and 85% of the initial weight, and the weight of whether there is a measurement demand should be between 90% and 100% of the initial weight; the weight of measurement operation technology should be between 115% and 120% of the initial weight; the weights of the 3 sub-criteria-level evaluation indicators at the entrusted verification institution level should remain unchanged.

Fig. 3
figure 3

Change in the proportion of the feasibility evaluation index R value range for the establishment of medical equipment metrological standards when the weight of mandatory calibration equipment indicator increases.

Fig. 4
figure 4

Change in the proportion of the feasibility evaluation index R value range for the establishment of medical equipment metrological standards when the weight of measurement operation skills indicator increases.

Fig. 5
figure 5

Change in the proportion of the feasibility evaluation index R value range for the establishment of medical equipment metrological standards when the weight of calibration work efficiency indicator increases.

Results of sensitivity analysis of main criterion level evaluation index

When perturbing the weights at the economic level, medical equipment level, medical institution level, entrusted verification institution level, and technical condition level, the internal weights of the sub-criteria-level evaluation indicators remain unchanged and are set to their initial weights, as shown in Table 2. The weight of the evaluation indicator with the smallest initial weight, the technical condition level, is fixed and unchanged. The weights of the three evaluation indicators at the economic level, medical equipment level, and medical institution level are perturbed by 5% of their initial weights, with a perturbation range from − 20 to 20% of the initial weights. The weight of the entrusted verification institution level is calculated based on the total weight value of the 5 evaluation indicators being 1. The distribution of the feasibility indicator R for the establishment of measurement standards for medical equipment, calculated from 729 different perturbation amounts, is shown in Fig. 6. The analysis concludes that the weight at the economic level should be between 80% and 85% of the initial weight, the weight at the medical equipment level should be 85% of the initial weight, and the weight at the medical institution level should be 95% of the initial weight.

Fig. 6
figure 6

Change in the proportion of the feasibility evaluation index R value range for the establishment of medical equipment metrological standards when the weight of economic level indicator increases.

Determination of evaluation index weights in GDM-AHP model

The reasonable value range of each evaluation index is obtained by perturbation method, and the initial weight is modified. The weight of each evaluation index after modification is shown in Table 3.

Table 3 The weight of each evaluation index after modification.

Application of GDM-AHP model

The model with adjusted indicator weights was applied to the feasibility evaluation of measurement standard establishment for 8 different types of medical equipment in 7 medical institutions. Variance analysis was conducted on the differences in the R values among the 7 medical institutions using the data analysis software IBM SPSS Statistics 27, and the results are shown in Table 4. The F value is the ratio of the between-group variance to the within-group variance of the R values of the 7 medical institutions, used to measure the relative magnitude of the differences between groups and within groups. A larger F value indicates that the between-group differences are relatively larger compared to the within-group differences. If the P value is less than 0.05, it is considered that the between-group differences are statistically significant61,62,63.

Table 4 Feasibility assessment weight values (R) for Establishing metrological standards for different types of medical equipment in various medical institutions.

Discussion

Comparative analysis of overall situations

From the data in Table 4, it can be seen that there are significant differences in the feasibility indicator R values for the establishment of measurement standards for medical equipment among the 7 different medical institutions (P < 0.05). Moreover, the R values for equipment in tertiary medical institutions are generally higher than those in secondary and primary medical institutions, indicating that the type of medical institution affects the feasibility indicator R values for the establishment of measurement standards for medical equipment. In addition, although the number of multi-parameter monitors is the largest among the eight devices, the R value of all the multi-parameter monitors in the seven medical institutions is less than 0.5, mainly because the multi-parameter monitors are compulsory verification work measuring instruments, which need to apply to the legal measurement verification institutions for verification, and the measurement service is free. Therefore, medical institutions do not need to establish measurement standards for multi-parameter monitors.

Comparative analysis of different medical institution

  1. 1.

    Comparison between A1 and A2: Both A1 and A2 are tertiary comprehensive hospitals, and there is no significant difference in the feasibility assessment weight values (R) for establishing metrological standards of medical equipment (P = 0.105 > 0.05). This similarity arises from the comparable scale, nature, types and quantities of medical equipment, availability of standards, personnel and technical conditions, and metrological demands in both institutions. However, across all equipment types, A2 consistently shows higher R values than A1. This difference is attributed to that A2 is located in a non-provincial capital area, so it has certain limitations in the selection of entrusted verification institutions compared with A1 and the response of metrology verification institutions to metrology needs is not as fast as A1, so the demand for establishing metrology standards is higher. Within the budget range of medical institutions, A1 considers the establishment of infusion pumps with measurement standards, A2 considers the establishment of measurement standards in order of priority from high to low, respectively, infusion pumps, ventilators, defibrillators, biosafety cabinets, high-frequency electrosurgical unit, PCR.

  2. 2.

    Comparison between A3 and A4: Both A3 and A4 are tertiary specialized hospitals, yet they differ in the types of equipment considered for establishing metrological standards, reflecting their respective specialization. The pulmonary specialized tertiary hospital (A3) considers ventilators for establishing metrological standards, whereas the maternal and child specialized tertiary hospital (A4) considers infant incubators.

  3. 3.

    Comparison between B1 and B2: Both B1 and B2 are secondary comprehensive hospitals, and for most type of equipment, the feasibility assessment weight values (R) for establishing metrological standards are below 0.5, indicating minimal necessity for establishing metrological standards. At the same time, there was no significant difference in the R value of the feasibility measurement index of the establishment of medical equipment measurement standards between the two medical institutions (P = 0.468 > 0.05), because the scale and nature of the two medical institutions, the type and quantity of medical equipment, standards, personnel and technical conditions, and measurement needs were similar.

  4. 4.

    Analysis of C1: C1 is a primary healthcare institution where the feasibility assessment and weight values (R) for establishing metrological standards for each type of equipment are below 0.5, indicating the priority of establishing measurement standards is low. This result is influenced by factors such as the institution’s scale, metrological demands, and technical and personnel conditions, among others.

The limitation analysis of the constructed model

The applicable object of the model constructed in this study is the medical equipment in public medical institutions, so the applicability of this study is limited for the medical equipment in private medical institutions or enterprises. Secondly, at present, there are no research results related to the establishment of a feasibility evaluation system for measuring standards of medical equipment in China. As a preliminary exploratory study, only 5 experts were selected in this study, which is the minimum number of experts to be selected in a preliminary exploratory study. In addition, the data collection method of this study is questionnaire survey, which may be subjective and biased to some extent. May be influenced by the respondents’ answer bias. Therefore, if further confirmatory studies need to be done or studies requiring a high degree of reliability, more experts may be needed to reduce the influence of subjectivity and bias on the study results.

In addition, sensitivity analysis is a very important step in this study. The perturbation method used in this study has the advantages of high computational efficiency and easy implementation, but its limitations also lead to the limitations of the results of this study. The perturbation method is based on the perturbation weight of a certain size within a given size range, so the accuracy of the calculation is relatively limited. When the model contains a large number of parameters, the analysis results of perturbation method may be difficult to interpret, which may make it difficult for decision makers to make effective decisions based on the analysis results. In order to avoid this situation, the evaluation index with the lowest initial weight is fixed during perturbation, which may also cause errors in the results. The perturbation method is based on the premise that each evaluation index is independent. Therefore, when re-determining the evaluation index of the criterion layer, if there is correlation between the indicators of the criterion layer, the perturbation method cannot be used for sensitivity analysis.

Conclusions

For medical institutions, the establishment of measurement standards is a long-term work, in addition to the assessment and use of new measurement standards, but also need to consider the follow-up preservation and maintenance of measurement standards, which also applies to the establishment of measurement standards of the unit’s technical conditions, staffing and so on put forward higher requirements. At present, domestic medical institutions in addition to military hospitals, other local medical institutions rarely establish medical equipment measurement standards, therefore, the establishment of measurement standards for medical institutions is an innovative work, facing greater challenges. Based on GDM-AHP, this paper constructs a measurement standard to establish a feasibility evaluation system from the evaluation indicators of 5 main criterion layers, such as medical institutions, medical equipment, and technical conditions, and 14 sub-criterion layers, such as the scale level of medical institutions, the allocation of related technical personnel and the allocation of standards, and sets sub-criterion layer indicators for each main criterion layer. The relative weight of each indicator is calculated through pound-to-pair comparison, and the sensitivity analysis of each indicator weight is carried out by using perturbation method. The GDM-AHP model after adjusting the indicator weight is applied to the establishment and decision-making of measurement standards for different equipment in different medical institutions, and the feasibility discussion on the establishment of measurement standards in medical institutions is transformed into a quantitative evaluation problem of multiple indicators, which makes the difficult to quantify decision problem more scientific and objective64,65.

In the future work, there are the following aspects of work need to be further studied. First of all, the evaluation object of this evaluation system is only a certain type of equipment of an institution. With the change of the measurement demand and other conditions of this type of equipment, the value of the evaluation index of the feasibility of establishing measurement standards will also change accordingly, that is, the R value changes dynamically. Therefore, it is worth thinking about how to optimize the threshold setting of this R indicator to improve the rationality of the decision of establishing measurement standards. Secondly, in view of the limitations of perturbation method in this study in the discussion section, how to combine perturbation method with other sensitivity analysis methods (such as finite difference method, direct differential method, etc.) to improve the accuracy and reliability of results needs to be further explored. Finally, for medical equipment in private medical institutions or enterprises, the applicability of this study has limitations. How to update the criterion layer evaluation indicators to increase the applicability of this model is also one of the future research directions66,67.