Introduction

Table tennis is one of China’s traditional competitive sports, and the enduring success of the sport can be attributed to the strenuous efforts of multiple generations engaged in relentless practice and technological innovation1. Table tennis, a sport centered around technical and tactical skills, garnered significant attention from coaches and researchers as early as the 1960s2,3. Initially, the classic analysis theory of table tennis tactics, the “Three-Phase Index Evaluation Method,” proposed by Wu et al.4, played a pivotal role in enhancing China’s table tennis team’s technical system and remains in use to this day. In recent years, over time, sports researchers have innovated and reformed various methods for analyzing table tennis tactics. China’s achievements in table tennis tactical analysis have flourished, dominating a significant portion of the world’s table tennis literature. For instance, Li et al.5proposed the “Ten-Index Evaluation Method.” Li6 introduced new concepts of “contribution rate” and “rapid diagnosis of technical and tactical elements” in his dissertation, providing reliable quantitative support for diagnosing table tennis technical and tactical aspects. Zhang et al.7,8 established the concept of Technical Effectiveness (TE) based on the relationship between scoring rates and utilization rates, and presented evaluation standards for the three-Phase index. Wu et al.9 introduced the “New Three-Phase Index Statistical Method,” effectively resolving incongruities between one party’s serving Phase and the other party’s receiving Phase (though this method is relatively complex). Subsequently, Yang and Zhang10,11. developed the “Four-Phase Index Evaluation Method,” offering a clearer resolution to the data mismatch issue present in classic analytical theories. They further introduced an evaluation standard for “technical strength differences,” enriching the rational selection of technical and tactical indicators. Yin et al.12 extended the “Four-Phase Index Technical Diagnosis Formula,” effectively addressing the issue of incomplete rapid diagnosis of the four-Phase technical and tactical indicators. Moreover, interdisciplinary fusion of information technology further propelled the application of various research methods in table tennis tactics, including artificial neural networks13,14, grey relational analysis15,16, decision tree algorithms17, data mining18,19, TOPSIS20, deep learning theories21,22,23, and secondary moving average methods24,25. These studies have significantly contributed to predicting the winning rules of table tennis matches and have greatly promoted methodological innovation, providing research support and assurance for China’s continued leading position in world table tennis.

Principal Component Analysis (PCA) is a multivariate statistical method that employs a dimension-reduction approach to derive a few comprehensive indicators summarizing the original variable information26. The method’s advantage lies in exploring complex issues with only a few principal components: it retains the main information from the original data, focuses on primary contradictions, avoids multicollinearity among variables, and enhances analytical efficiency27. However, it also possesses certain limitations, such as ambiguity in the meaning of comprehensive evaluation functions when factor loadings have both positive and negative signs, resulting in a loss of evaluation accuracy. Therefore, to clearly reflect comprehensive evaluation results and present concise conclusions, it is generally combined with Cluster Analysis (CA) for assessment28. Currently, the application of PCA and CA is mainly concentrated in areas such as performance management, public health, sociology, with limited exploration within the field of sports sciences, still in its infancy Phase29.

Given this background, this study attempts to introduce a combination of Principal Component Analysis and Cluster Analysis to evaluate the competitive activity and high-profile participation of selected top male table tennis athletes globally. Data for this study are drawn from important individual matches organized by the ITTF from 2018 to 2021, comprising 40 significant matches totaling 226 matches and 4177 points. The “New Four-Phase Index Statistical Method” will be employed to comprehensively assess the Technical Effectiveness (TE) of the observation indicators corresponding to each phase. The aim is to comprehend and grasp the pivotal “keys” for players to secure victories in table tennis matches, examine the reliability and practical value of the “New Four-Phase Index Statistical Method” in technical and tactical statistics, and construct a comprehensive evaluation model for the technical and tactical effectiveness indicators of table tennis matches using Principal Component Analysis in conjunction with Cluster Analysis. This research seeks to provide theoretical reference for devising appropriate technical and tactical training plans for high-level table tennis training teams in various countries and regions.

Methods

Data sources

This study utilized match videos organized by ITTF featuring eight highly active and notable male table tennis players in the current international table tennis scene. These players originate from China, Japan, South Korea, and select European countries, holding world rankings between 1 and 20. And the players selected for the competition are the highest-ranking athletes from their respective regions, who are notably active participants in a variety of events organized by the International Table Tennis Federation. Apart from one player using a left-handed grip, all other players use a right-handed grip for their table tennis bats. Their predominant playing styles involve a combination of two-sided reverse rubber looping drives and quick attacks, as well as a blend of two-sided reverse rubber quick attacks and looping drives. The research aims to comprehensively evaluate and analyze the Technical Effectiveness (TE) indicators in various techniques employed by these eight players in selected heavyweight individual matches they participated in over the last four years (totaling 40 matches, comprising 226 matches and 4177 rounds). All match footage utilized in this study is derived from official television broadcasts and online sources, with the research having been approved by the Adamson University Ethics Committee (Approval No. 2024-04-EDU-107). Information regarding the 40 matches is detailed in Table 1.

Table 1 Information on table tennis players participating in competitions.

Video analysis

In this research, observation, analysis, and statistical data were collected by observing publicly available videos of representative international table tennis matches participated in by the eight athletes on official websites such as China Central Television (CCTV) Sports Channel (https://sports.cctv.com), Bilibili (https://www.bilibili.com), and the International Table Tennis Federation (https://www.ittf.com) from 2018 to 2021 (totaling 40 matches). Prior to data collection, a technical and tactical data statistics spreadsheet was meticulously crafted in Excel, followed by the observation and documentation of online match footage via video analysis. Furthermore, data acquisition is predicated on the final stroke resulting in a score or point loss for either player during each point of the match. The raw data employed in this study consists of the points won or lost by the athletes’ concluding stroke in each match. Data were obtained based on the last stroke made in each point, whether won or lost, by both players during the matches. For example, if, after one player serves, the other player receives the serve and scores an attacking point, this would be recorded as the serving player losing the point on the third stroke, and the receiving player winning the point on the second stroke. This process was repeated for each point.

To ensure the authenticity and reliability of the data, the data collection for all matches was independently conducted by our research team. Additionally, during the data mining process, three table tennis major students were trained to observe and record the match data we collected. To verify the consistency of the observed data, we compared the data we collected with data independently recorded by the students (20 matches each). Using the Kappa test, Kappa = 0.950, P < 0.01, indicating excellent agreement in the observed data (Table 2).

Table 2 Kappa consistency test.

New four-phase index statistical method

The selection of indicators in this study is based on the “New Four-Phase Index Statistical Method,” an improvement upon the “Four-Phase Index Assessment Method” developed by Yang and Zhang10, as well as insights from Jiang et al.30 and Zhou et al.31 in their respective research. It is acknowledged that the 5th and 6th strokes serve as the transitional phase between offensive and defensive play and play a crucial role in the transition of attack and defense during table tennis matches. The 5th and 6th strokes are pivotal in the context of match dynamics, marking the transition from the attack- after-serve phase (1st, 3rd strokes) and attack-after-receive phase (2nd, 4th strokes) to the rally phase. In light of expert opinions and previous research, considering the significance of the 5th and 6th strokes in the transition of every point’s attack and defense and the increased number of rounds due to the adoption of the “new material 40 + table tennis ball,” conventional statistical methods for technical and tactical indicators such as the three-phase index statistical method and the ten-phase index statistical method are inadequate in meeting the challenges posed by rule changes and equipment reforms in table tennis. Hence, this study departs from the traditional practice of selecting single-phase indicators and improves upon the “Four-Phase Index Statistical Method” developed by Yang and Zhang10. The selected phases align with this study’s division, where the attack-after-serve phase (serve, the 3rd stroke, the 5th stroke loss), attack-after-receive phase (receive serve, the 4th stroke), rally phase I (the 5th stroke gain, the 7th stroke and subsequent odd-numbered strokes), and rally phase II (the 6th stroke and subsequent even-numbered strokes) are enhanced. The 5th and 6th strokes are boldly designated as two separate observation indicators (referred to as the transition phase), while rally phase I (the 7th stroke and subsequent odd-numbered strokes) and rally phase II (the 8th stroke and subsequent even-numbered strokes) are set as two separate observation indicators (referred to as rally phases). The attack-after-serve phase (serve, the 3rd stroke) and attack-after-receive phase (receive serve, the 4th stroke) are similar to the classic “Three-Phase Index” division. In this study, the observation indicators are divided into the attack-after-serve phase (serve, the 3rd stroke), attack-after-receive phase (receive serve, 4th stroke), transition phase (5th stroke, 6th stroke), and rally phase (rally phase I, rally phase II). Eight observation indicators were selected from the New Four-Phase Index Statistical Method for match data statistics and analysis. The model for the table tennis technical and tactical indicator system is illustrated in Fig. 1.

Fig. 1
figure 1

Table tennis technical and tactical indicator system model.

Definitions and explanations of each observation indicator are as follows.

Attack-after-Serve phase

Serve (the 1st stroke) refers to the first stroke technique that is not constrained or limited by the opponent’s shot32,33,34.

3rd stroke refers to the stroke played by the serving player in response to the opponent’s return after their serve, utilizing either an attacking or controlling technique to hit the ball.

Attack-after-Receive phase

Receive serve (the 2nd stroke) refers to the various techniques used when returning the opponent’s serve35.

The 4th stroke refers to the techniques employed when receiving the opponent’s 3rd stroke, involving various attacking and controlling techniques.

Transition phase

The 5th stroke refers to the various attack and defense transition techniques used by the serving player when receiving the opponent’s 4th stroke.

The 6th stroke refers to the various attack and defense transition techniques used by the serving player when receiving the opponent’s 5th stroke.

Rally phase

Rally I Phase refers to the various rally techniques used by the serving player when receiving the opponent’s 6th stroke and subsequent odd-numbered strokes (including both sides maintaining the attack or one side maintaining the defense).

Rally II Phase refers to the various rally techniques used by the serving player when receiving the opponent’s 7th stroke and subsequent even-numbered strokes.

Algorithms for scoring rate and utilization rate of each observation indicator

In order to effectively present the computation process for scoring rate and utilization rate of the eight observation indicators in the “New Four-Phase Index,” this study employed the outcome of scoring or losing a point by Player A in the final stroke of each ball as the observation point. The codes for each observation indicator are presented in Table 3.

Table 3 Codes for scoring and losing points in table tennis Technical-Tactical phases.

For ease of description, let Z represent the total points scored and lost in the entire match, i.e., Z = A + B + C + D + E + F + G + H (and the following). The formulas for calculating the scoring rate and usage rate of each observation indicator are shown in Table 4.

Table 4 Formulas for calculating scoring rate and usage rate of each observation Indicator.

Calculation of technical effectiveness values for each observational index

Technical Effectiveness (TE) refers to the effective utilization of an athlete’s competitive actions (technique or tactics), influenced by two key indicators: scoring rate and utilization rate. When the scoring rate exceeds 0.5, a higher utilization rate increases the probability of winning. Conversely, when the scoring rate is less than 0.5, a higher utilization rate exacerbates the negative impact on the probability of winning the match. Therefore, the significance of the Technical Effectiveness (TE) value primarily depends on the quality of the scoring rate, with the utilization rate serving as an auxiliary reference value for observation. Hence, in order to encompass both the scoring rate and utilization rate, this study selected the Technical Effectiveness values of various indicators for evaluation. Furthermore, following this principle, Zhang et al.8 also proposed a formula for calculating Technical Effectiveness, as follows:

\({\text{TE}}={\text{ }} - \left( {{\text{1}}+\frac{{\sqrt 2 }}{2}} \right)+({\text{1}}.{\text{5}}+\sqrt 2 ) \times \left[ {{{\left( {{\text{1}}+{\text{UR}}} \right)}^{{\text{SR}} - 0.{\text{5}}}}} \right]+\frac{{\sqrt 2 }}{2}\left[ {{{\left( {{\text{1}}+{\text{UR}}} \right)}^{{\text{2}}({\text{SR}} - 0.{\text{5}})}}} \right]\)

By using the formula in Table 4 and substituting the values of scoring rate and utilization rate, calculated from the conversion of scored and lost points for each observation index in the “new four-Phase index,” the corresponding Technical Effectiveness values for each observational index can be obtained.

Data transformation and processing

Initially, based on the new four-phase index statistical method designed in this paper, the last stroke of each observation indicator’s score or loss is statistically recorded, and the original data of the match between the international outstanding male table tennis players selected in this study and their opponents are collected. Subsequently, the obtained raw data is input into the Microsoft Excel spreadsheet to establish a database. Then, according to the score/loss of each observation indicator in each match, the score rate and usage rate are calculated by substituting them into the formulas (Table 4), and the technical efficiency value is calculated by substituting the score rate and usage rate into the technical efficiency formula. The technical efficiency value is the empirical application data required for this study. Finally, SPSS 25.0 statistical software (SPSS Inc., Chicago, IL, USA) is used to conduct principal component and cluster analysis to verify the application effect of the PCA-CA comprehensive evaluation method on table tennis technical efficiency. The technical efficiency evaluation model of table tennis matches is shown in Fig. 2.

Fig. 2
figure 2

Table tennis match technical effectiveness evaluation model.

Table tennis match Technical Effectiveness evaluation model: Initially, the Principal Component Analysis (PCA) method is used to transform the initial indicator data to identify the critical components and contribution weights that significantly affect the comprehensive athletic strength of athletes in the matches they participate in. Subsequently, based on PCA, Cluster Analysis is conducted using the data of principal component scores to classify the total athletic strength of athletes in various matches.

Operating steps

Step 1

Data Processing - First, apply the “trend convergence” method to process various indicators, which involves converting reverse indicators to direct ones, ensuring that all indicators are comparable in the same direction36. Since the indicators in this study represent high-quality technical efficiency, meaning that higher values correspond to a greater probability of winning matches, there is no need to perform “trend convergence.” Next, standardize the indicators to eliminate differences in scale and magnitude between relative and absolute indicators.

Step 2

Compute the covariance matrix using the processed data. The covariance matrix represents the covariance of each tactical indicator data for the jth item in the ith match. The specific formula is provided as.

\({R_{ij}}=\frac{{\sum\limits_{{k=1}}^{n} {({x_{ki}} - {x_i}) \times ({x_{ki}} - {x_i})} }}{{\sqrt {\sum\limits_{{k=1}}^{n} {{{({x_{ki}} - {x_i})}^2} \times {{({x_{ki}} - {x_i})}^2}} } }}\)

Step 3

Calculate the eigenvalues, variance contribution rate, and cumulative variance contribution rate, indicating the importance of primary technical indicators influencing the comprehensive athletic strength in each match. A higher variance contribution rate suggests a more crucial common factor and a more significant contribution to the variables36. The primary component contribution rate is (1) and the cumulative contribution rate is (2).

$$W_{i} = \frac{\lambda }{{\sum\limits_{{j = 1}}^{p} {\lambda _{j} } }}$$
(1)
$$W_{i} = \frac{{\sum\limits_{{j = 1}}^{m} {\lambda _{i} } }}{{\sum\limits_{{j = 1}}^{m} {\lambda _{j} } }}(m \ge p)$$
(2)

Step 4

Determine the primary components and construct the primary component evaluation matrix. Generally, an eigenvalue greater than 1 is used as one of the criteria for extracting principal components37,38. Assuming the set of principal components as {P = P1, P2, ., Pk}, where Pj represents the jth principal component that contains most of the information. These mutually independent components replace the original indicators for evaluating the athletic strength of each match. Using \(E = (e_{{tj}} )_{{n \times k}}\)as the matrix of principal component coefficients, it represents the coefficient of the indicators in the principal components. The formula for calculating the scores of each principal component is Y = D × E, where \(y_{{ij}} = a_{{it}} \times e_{{tj}}\).

Step 5

Calculate the comprehensive score coefficients. For the jth principal component of the ith object under evaluation (i = 1, 2, L, n ; j = 1, 2, L, p), the information contribution rate of the principal component to the original variables serves as its proportion in the comprehensive evaluation score. This process yields the comprehensive score for the jth object under evaluation \(Z_{j} = \sum\limits_{{j = 1}}^{p} {W_{i} Y_{{ij}} }\). Using the comprehensive scores of each evaluated object, the comprehensive evaluation results for each indicator in each match can be deduced.

Step 6

Establish a systematic clustering method. Calculate the distances between all samples of the established principal components (n samples), \(d_{ij}\)represented as\(D = (d_{{ij}} )_{{n \times n}} D_{{ij}} = (x_{{ki}} - x_{{kj}} )^{2}\). By defining n categories, each containing only one sample, and setting all platform heights to zero, the method merges the two nearest categories, creating a new one. It utilizes the distance value between these two categories as the platform height in the clustering graph, resulting in the classification and clustering results. The distance calculation criterion employed is the squared Euclidean distance \(D_{{ij}} = (x_{{ki}} - x_{{kj}} )^{2}\).

Results

Suitability analysis of the principal component model

Based on the previously established technical and tactical indicator system, this study extracted eight technical observation indicators from the improved “New Four-Phase Indicators” as the evaluation data. The collected data was subjected to normalization using the normal distribution data standardization method, and the original data underwent data preprocessing to eliminate the influence of scale. Subsequently, the validity of data was tested, KMO = 0.643, which is above the basic threshold of 0.5. Additionally, Bartlett’s Sphericity Test had an approximate chi-square value of 64.902 (P<0.01) indicating that the selected variables are suitable for principal component analysis (Table 5).

Table 5 KMO and Bartlett test.

Principal component extraction and loading coefficient matrix

Following the extraction principle of λ ≥ 1 for principal components and considering both variance contribution rates and scree plot determination39, the final number of principal components was determined. Table 6; Fig. 3 show the eigenvalues, which are as follows: λ1 = 2.133, λ2 = 1.791, λ3 = 1.058, λ4 = 1.046. Furthermore, the cumulative contribution rate reached 75.343%, meaning that 75.343% of the information from the original indicators was retained. It is considered that the first four principal components provide sufficient information, and these four principal components were chosen as the indicator data for cluster analysis.

Table 6 Principal component eigenvalues and variance explained rate.
Fig. 3
figure 3

Characteristic root scree plot of PCA.

In specific statistical practice, the impact of an assessment indicator on the principal components is closely related to the loading matrix of the principal components. Based on the principal component loading matrix results (Table 7), PC1 has significant loadings on X4 (receiving), X8 (Rally II), and X6 (the 6th Stroke) with loading coefficients of 0.672, 0.662, and 0.661, respectively. This reflects the athlete’s ability to transition from attack after receiving to the rally phase in terms of technical and tactical performance. PC2 has significant loadings on X8 (Rally II) and X6 (the 6th stroke). When combined with the high loading coefficients of PC1, it further explains the athlete’s competitive performance in rally and the transition between attacking and defending in PC2. PC3 and PC4 have significant loadings on X5 (the 4th stroke) and X2 (the 3rd stroke), with loading coefficients of 0.743 and 0.643, respectively. This indicates the athlete’s ability to perform attack after receive (the 4th stroke) and serving and attack after serve (the 3rd stroke), which are primary means of scoring in the match.

Table 7 Load coefficient matrix.

Principal component score coefficient matrix

Based on the principal component loading coefficient matrix and the four principal component eigenvalues, the principal component score coefficient matrix can be derived ( Table 8). Subsequently, a linear equation for the comprehensive score of each principal component using the corresponding variance contribution rate as weights can be established:

\(\begin{aligned} P{C_{\text{1}}}{\text{Score}}\,= & \,0.{\text{341 }}*{\text{ X1}}\,+\,0.0{\text{82 }}*{\text{ X2 }}+{\text{ }} - 0.{\text{37 }}*{\text{ X3}}\,+\,0.{\text{46 }}*{\text{ X4}} \\ & +\,0.0{\text{68 }}*{\text{ X5}}\,+\,0.{\text{453 }}*{\text{ X6 }}+{\text{ }} - 0.{\text{336 }}*{\text{ X7}}\,+\,0.{\text{453 }}*{\text{ X8}} \\ \end{aligned}\)

\(\begin{aligned} PC_{{\text{2}}} {\text{Score }} = {\text{ }} & - 0.{\text{232 }}*{\text{ X1 }} + {\text{ }} - 0.{\text{376 }}*{\text{ X2}} + 0.{\text{369 }}*{\text{ X3 }} + - 0.0{\text{31 }}*{\text{ X4 }} \\ & + {\text{ }} - 0.{\text{225 }}*{\text{ X5}} + 0.{\text{459 }}*{\text{ X6 }} + {\text{ }} - 0.{\text{431 }}*{\text{ X7}} + 0.{\text{469 }}*{\text{ X8}} \\ \end{aligned}\)

\(\begin{aligned} P{C_{\text{3}}}{\text{Score}} = & 0.{\text{358 }}*{\text{ X1}}\,+\,0.{\text{233 }}*{\text{ X2}}\,+\,0.{\text{498 }}*{\text{ X3 }}+ - 0.{\text{1}}0{\text{1 }}*{\text{ X4}} \\ & +\,0.{\text{723 }}*{\text{ X5}}\,+\,0.0{\text{93 }}*{\text{ X6}}\,+\,0.{\text{135 }}*{\text{ X7}}\,+\,0.0{\text{97 }}*{\text{ X8}} \\ \end{aligned}\)

\(\begin{aligned} P{C_{\text{4}}}{\text{Score}}= & 0.{\text{4}}0{\text{4 }}*{\text{ X1}}\,+\,0.{\text{621 }}*{\text{ X2}}\,+\,0.0{\text{32 }}*{\text{ X3 }}+ - 0.0{\text{23 }}*{\text{ X4 }} \\ & +{\text{ }} - 0.{\text{51 }}*{\text{ X5}}\,+\,0.0{\text{51 }}*{\text{ X6}}\,+\,0.{\text{431 }}*{\text{ X7 }}+{\text{ }} - 0.0{\text{21 }}*{\text{ X8}} \\ \end{aligned}\)

\(\begin{aligned} PC_{{{\text{COMP}}}} {\text{Score }} = & ({\text{26}}.{\text{66}}\% *PC_{{\text{1}}} {\text{Score}}\, + \,{\text{22}}.{\text{39}}0{{\% }}*PC_{{\text{2}}} {\text{Score}} \\ &+ {\text{13}}.{\text{222 }}\%*PC_{{\text{3}}} {\text{Score}}\, + \,{\text{13}}.0{\text{71 }}\%*PC_{{\text{4}}} {\text{Score}})/{\text{ 75}}.{\text{343 }} \% \\ \end{aligned}\)

Note The terms PC1 Score, PC2 Score, PC3 Score, and PC4 Score respectively represent the scores of Principal Component 1, Principal Component 2, Principal Component 3, and Principal Component 4; PC COMP Score denotes the composite score of the principal components, with the same notation used hereinafter.

Table 8 Linear combination coefficient matrix.

Comprehensive evaluation and ranking of table tennis competitions

For ease of reference, this study employs symbols H1 to H40 to represent the 40 table tennis matches. Through the model described above, each match’s comprehensive score was computed and ranked (Table 9). Existing research has indicated that the higher the comprehensive score, the better the corresponding match’s competitive strength40. Based on the statistical results in Table 9, it can be observed that among the 40 matches, when ranked by descending score, 21 matches had positive comprehensive scores, while 19 matches had negative scores. Of these, the top 11 matches in the comprehensive ranking were won with relative dominance, aligning with the athletes’ competitive strength. Furthermore, matches with positive comprehensive scores also exhibited occasional signs of defeat, such as H12 (0.419), H21 (0.371), H25 (0.076), and H11 (0.055). The reason behind this phenomenon is that athletes, in their encounters with top-level opponents, lacked an absolute advantage. Under conditions where both sides had relatively equal competitive strengths, minor differences in competition outcomes, whether in favor or defeat, are expected. Addressing this requires athletes to further improve their technical and tactical proficiency and enhance their psychological attributes, especially in handling critical aspects of the match. In the comprehensive ranking, matches with larger negative comprehensive scores had lower probabilities of winning, suggesting weaker sustained offensive and defensive transition skills in technical tactics. However, there were occasional instances where matches with negative comprehensive scores resulted in victories. This was due to athletes exhibiting their comprehensive technical and tactical skills to the fullest in certain Phases of the match. In such cases, the probability of defeat was relatively high because the allocation of technical tactics across different Phases was uneven or distinct tactical deficiencies existed. Moreover, the phenomenon of “imbalance between the total score and the outcome”41,42was also observed, where athletes won the match but had a lower total score than their opponents. This discrepancy was evident in a match like H5, where the comprehensive score was negative, yet the athlete emerged victorious. Research indicates that the probability of such imbalances in the outcome of table tennis matches is around 5%43. Nevertheless, athletes may exhibit suboptimal performance in competitions due to a multitude of factors that extend beyond the primary considerations of tactics and techniques. For instance, the psychological aspects of managing “key points” within the game, the geographical context, the ambiance of the audience at the venue, and the instructive input from coaches are all determinants that can markedly influence an athlete’s competitive performance. These elements are also key areas that scholars should focus on in future research.

Table 9 Principal component scores, composite scores, and ranking.

Cluster analysis of table tennis competitions’ comprehensive competitiveness

Cluster Analysis (CA) is a method that categorizes research samples based on specific criteria, providing results that are objective, comprehensive, and scientific44. Building upon the principal component analysis, this study applied clustering analysis [Note: The rationale for employing systematic clustering lies in its simplicity, swiftness, and straightforward implementation. It is adept at managing clusters of diverse morphologies and densities, and is proficient in the detection of outliers] to the four principal component scores of 40 table tennis matches to assess their comprehensive competitiveness using Ward’s method [Note: Ward’s method is a prevalent hierarchical clustering algorithm, employed to segregate data points into distinct groups or clusters] and Euclidean distances [Note: This refers to the squared Euclidean distance between two points in space]. The results, as depicted in Fig. 4 and complemented by Table 10, clearly illustrate the clustering outcomes. The 40 table tennis matches were categorized into three classes: the first class had 10 matches, accounting for 25%; the second class had 16 matches, accounting for 40%; and the third class had 14 matches, accounting for 35%. The matches in the first class differed somewhat from the comprehensive ranking mentioned earlier (Table 9), defined as “fluctuations in competitive levels”. Nonetheless, in these matches, the athletes demonstrated noticeable fluctuations in their competitive levels, a phenomenon known as “bipolar trend”45,46. Athletes showed a significant competitive performance against opponents with slightly weaker abilities but exhibited a contrasting performance against stronger opponents. The primary reason behind this phenomenon is the inadequacy in establishing a comprehensive technical system, particularly deficiencies in the connection of offensive and defensive tactical transitions. Beyond ensuring physical fitness and mental preparedness, further improvements in the technical system are required to mitigate such issues in these matches. The second class primarily exhibited relatively stable competitive performance in these matches, defined as “consistent performance levels”. Athletes could effectively channel their strengths in technical tactics and maintain a high-quality offensive and defensive awareness from attack after serve and attack after receive phase to rally phase, proving to be a crucial element for winning. Conversely, matches in the third class demonstrated relatively weaker competitive performances, defined as “weaker performance levels”. In these matches, athletes faced significant point losses, revealing notable deficiencies in the connection of technical tactics. The failure to leverage the offensive advantages of the first four strokes in both attack after serve and attack after receive phase led to passive point losses in the rally phase, disrupting the overall technical and tactical rhythm of the match, which was the fundamental reason for defeats in these matches. This analysis substantiates that athletes’ stable offensive and defensive transitions across different technical phases, combined with having absolute advantages in specific technical phases, are fundamental for ensuring victory in the match47.

Fig. 4
figure 4

Systematic cluster analysis diagram of comprehensive competitive strength of 40 table tennis matches (Note: The horizontal axis represents the range of clustering, and the vertical axis represents the distance of the sample points.)

Table 10 Clustering categories of comprehensive competitive strength of 40 table tennis matches.

Discussions

Criteria for establishing observation indices for table tennis tactics

With the continuous evolution of international table tennis rules and equipment, the tactical systems in table tennis have undergone significant changes. These changes have shifted the focus from the “front three boards” to multiple-stroke rallies48. This transformation is aimed at slowing down ball speed, reducing spin, and increasing the number of rallies, ultimately promoting the balance of table tennis and increasing its appeal to the audience49. Research has shown that the use of ABS (Acrylonitrile Butadiene Styrene) plastic material table tennis balls (with a diameter of 40 mm + or 40.6 mm ≤ ball diameter>40 mm) leads to a reduction of approximately 5% in spin and 2% in ball speed compared to celluloid material table tennis balls (with a diameter of 40 mm–40 mm ≤ ball diameter ≥ 39.5 mm)50. This increase in rally lengths in high-level table tennis matches, with an average of 5 to 8 rallies per point, has disrupted the previous pattern where the average number of rallies per point was between 2 and 531. The increase in rallies naturally leads to changes in the evaluation system for table tennis tactics, which will continue to evolve with technological advancements such as the introduction of artificial intelligence, thus further pushing for improvements in players’ tactical and technical skills51,52. Therefore, continually improving the construction of the table tennis tactical evaluation system is a necessary step for coaches and researchers to seek breakthroughs in players’ competitive levels. Researchers have explored various tactical evaluation systems over the years. The earliest, dating back to 1988, was the “Three-Phase Evaluation Method” proposed by Chinese scholar Wu Huanqun. This method laid an essential foundation for diagnosing and assessing table tennis tactics at that time and contributed to maintaining China’s dominant position in table tennis for an extended period53. As table tennis rules and equipment continue to evolve, tactical evaluation systems have evolved accordingly. Evaluation systems constructed based on the “Three-Phase Evaluation Method” include the “Three-Phase”4,54,55, “Dynamic Three-Phase”9,56, “Double Three-Phase”57, “Interactive Three-Phase Structure”58, “Four-Phase”10,59,60,61, “Five-Phase”30, “Six-Phase”62, “Nine-Phase”63, “Ten-Phase”5 and etc. (Table 11). While these tactical observation systems offer various options for selecting observation indices, they come with challenges in practice, including discrepancies in data collection, complex data recording processes that can lead to omissions and missing data, and difficulties in understanding dynamic changes in tactical indices10,61.

According to the current statistics in Table 11 on various observation index systems for table tennis competitions proposed by scientific researchers (1988–2023), in addition to the “Three-Phase Evaluation Method” proposed by Wu4, which has been used more frequently in the academic community, the second It belongs to the “four-stage indicator evaluation method” proposed by Yang and Zhang10. It has been used 39 times, accounting for 29.1%. The use of other various indicator systems is relatively low, in the range of 0.74-8.2%. In addition to the certain problems mentioned above, these technical and tactical observation index systems also have the problem that they cannot meet the ITTF’s reform of equipment, which has led to changes in athletes’ technical and tactical systems. Therefore, Based on the table tennis tactical observation index system constructed by Yin et al.61., this study selected eight sub-observation indices, following the principles of the “New Four-Phase Observation Method.” This method preserves the traditional three-phase tactical indices for the attacking phase (1st and 3rd strokes) and the defending phase (2nd and 4th strokes) while grouping the 5th and 6th strokes as a transition phase. The rationale for this is that ABS plastic material table tennis balls play a crucial role in transitioning from attacking to defending or receiving, serving as a bridge between the two30. Therefore, this study treats the 5th and 6th strokes as a distinct transition phase. Additionally, because table tennis matches have no fixed upper limit on the number of rallies per point, but high-level players rarely engage in continuous multiple-stroke rallies, this study divided the odd-numbered strokes from the 7th stroke into rally I, and the even-numbered strokes from the 8th stroke into rally II, collectively referred to as the rally phase. This approach serves not only avoid the shortcomings in the classic “Three-Phase Evaluation Method”, where data inconsistency arises on the 5th stroke, but also addresses the issue in the “Four-Phase Evaluation Method’s”, where loss for the 5th stroke in the serve round is attributed to the attack after serve phase and score for the 5th stroke is attributed to rally I. This can lead to potential omissions in statistical analysis of technical data. Furthermore, it aims to adapt to equipment reforms that have resulted in table tennis athletes moving towards a trends of employing multiple-stroke rally, posing a challenge in effectively and clearly categorizing athletes’ competitive strengths at different phases61. Thus, this study selected the eight sub-observation indices from the “New Four-Phase Observation Method” as the basis for data collection and analysis, as they are more feasible for assessing table tennis match data.

Table 11 Table tennis match technique and tactics observation index system list(1988——2023).

Application effect of principal component analysis

Principal Component Analysis (PCA), as a dimensionality reduction method26, is currently well-established in healthcare assessment systems, organizational performance management, quality evaluation of agricultural products, and environmental monitoring64. Its application in the field of sports research includes sports industry and public services36,65, evaluation of sports teaching quality66, and assessment of physical activities67,68. In this study, we employed Principal Component Analysis to analyze eight selected sub-observational indicators from the New Four-Phase Index in 40 international high-level table tennis matches to evaluate the overall competitive strength of each match. Using this method, we established four principal components, which collectively accounted for 75.343% of the variance in the original variables. These components were used to summarize the attribution of other observational indicators, emphasizing the athletes’ ability in different phases of the match or specific observational techniques (Table 7). Furthermore, the scores from these four principal components corresponded effectively with the overall scores (Table 9), providing a relatively objective and rational reflection of the comprehensive competitive strength in each match. A higher score in principal components indicated a greater probability of victory in the entire match, as supported by the consistency between the top 11 ranked matches in principal component scores and the actual match outcomes. Conversely, lower-ranked matches in principal component scores carried higher risks of losing (although some special cases like the “imbalance between match total scores and results” occurred at a 5% probability), consistent with prior research findings42. At the same time, based on the scores of the four principal components, there is also a curious discovery.Taking the example of the H35 match, which ranks first in the overall composite scores of the principal components, Table 7 reveals that the four principal components, as defined by the results of the loadings matrix, encompass different levels of competitive performance. Specifically, PC1 reflects the athlete’s ability to transition from attack after receive phase to implementing rally tactics, PC2 reflects the athlete’s ability in the transition between offense and defense, PC3 reflects the athlete’s tactical ability in the fourth stroke, and PC4 reflects the athlete’s tactical implementation ability in the third stroke. Considering the scores of the four principal components (0.528, 1.978, 1.114, -0.476) as presented in Table 9, it is evident that the athlete from Party A excels in the ability to transition in rally (rally II, the sixth stroke) and the tactical implementation ability in the fourth stroke. However, the technical implementation ability in the third stroke appears relatively weaker in this match. Nevertheless, despite this relative weakness, the athlete emerges victorious in this match, ranking first in the overall composite scores of the principal components across all matches. This discovery indicates that the athlete successfully leveraged their strengths while compensating for weaker aspects of their match. In cases where the athlete’s strengths are not as pronounced, and their weaknesses are more evident, the probability of losing is considerably higher. The ability of athletes to adjust and exploit their strengths according to the playing style of their opponents is a key factor in determining the outcome of the match. Furthermore, through a review of both domestic and international literature, it was discovered that only high-level basketball and soccer matches have employed principal component analysis for a comprehensive assessment of offensive and defensive abilities40,47,69. The applicability of these findings is limited, and no such research has emerged in skill-dominated, net-separated competitive sports. Given the successful application of principal component analysis in evaluating the comprehensive competitive abilities in high-level table tennis matches in this study, it holds certain value in contributing to the enrichment of research in this area.

Application effect of cluster analysis

Cluster analysis can effectively capture differences between various characteristics and group attributes that are similar together. The more indicators involved in the clustering, the better it reflects the comprehensive characteristics of the attributes64. In this study, we used cluster analysis to categorize the overall competitive strengths of 40 international high-level table tennis matches into three classes. Through variance analysis, there were statistically significant differences between all three categories (Table 12). This demonstrates the effectiveness of classifying the four principal components, indicating their ability to reflect the differences in competitive strengths of the athletes in these 40 matches effectively. It also demonstrates that they can reasonably and effectively categorize table tennis matches with similar competitive performances. Historically, cluster analysis has been primarily applied in the field of sports for assessing the physical health of youth and evaluating the efficiency of competitive sports. For example, cluster analysis can effectively classify the levels of physical health risk among young people70. It can also categorize the competitive strength levels of different countries’ sports programs based on the number of medals won in major international sports competitions, thus effectively identifying the differences between strong, potentially strong, and weak sports programs71. However, in the realm of technical and tactical analysis in ball sports, research results utilizing cluster analysis are relatively scarce. In this study, we employed cluster analysis to generate visual and tabular data that clearly reflect the differences between the various matches. It also effectively categorized these 40 matches, with the categorized matches aligning closely with their actual outcomes. For example, in the match H23, when the athlete was competing against the world’s number 2 ranked player at the time, the weaknesses in their performance during different phases of the match became evident. They struggled in longer rallies, and their primary scoring techniques were not as effective in this particular match. Consequently, this match was classified into the third category, defined as “weaker competitive performance.” The categorization and definitions of other matches followed similar patterns. Therefore, in practice, cluster analysis can provide coaches with valuable guidance to better understand the differences in the overall competitive strengths of athletes in each match and offer improved strategies for the application of technical and tactical skills. In theory, cluster analysis can enrich the research achievements in the field of table tennis technical and tactical analysis. Furthermore, cluster analysis can be applied and extended to other skill-dominant, net-separated competitive sports.

Table 12 One-way ANOVA table for PC COMP score.

The value of combining both methods for evaluating table tennis competitive strength

This study combines cluster analysis with principal component analysis to better classify the overall competitive strengths of each match. Principal component analysis (PCA) effectively extracts crucial technical characteristics of table tennis athletes, reduces dimensionality, retains critical information, and identifies technical and tactical indicators influencing athletes’ competitive levels to evaluate their overall competitive strength throughout the entire match. On the other hand, cluster analysis, built upon principal component analysis, effectively categorizes matches based on their competitive states. For instance, as seen in the aforementioned study, matches categorized and defined as “weaker competitive performance” could assist in identifying the reasons behind athletes’ losses in matches. Apart from an athlete’s own underperformance in technical and tactical aspects, reasons for losing to opponents might involve factors such as adjusting to the opponent’s grip and playing styles (right-handed or left-handed grip, and various playing styles), or the psychological factor of facing highly-ranked opponents. Even though this study does not emphasize analyzing these factors in detail (due to their broad scope, which is beyond the main content of this paper), they could potentially become areas of interest for future studies. Cluster analysis successfully groups homogeneous matches into categories, aiding coaches in understanding athletes’ competitive statuses. However, both methods have their limitations. For instance, PCA is not suitable for nonlinear data, which may result in the loss of some information while simplifying data. Furthermore, it might not classify comprehensive rankings reasonably72. Systematic cluster analysis is not applicable for extensive datasets, which reduces its computational speed and increases complexity73. This research has combined both methods, bypassing the individual limitations of each. By using PCA for dimensionality reduction and avoiding the complexity of cluster analysis computation, and subsequently conducting reasonable classification, the analysis efficiency is increased, achieving a complementary effect. Previous studies on the evaluation of table tennis technical and tactical aspects have mainly employed traditional analytical methods, describing score rates and usage rates of various indicators74. In recent years, with interdisciplinary integration, research methods in the field of table tennis technical and tactical analysis have progressed20. These include the use of chi-square tests to qualitatively analyze scoring and losing points in table tennis matches75, logistic regression models for technical and tactical analysis76, and secondary moving average methods to analyze athletes’ match situations24,25. The utilization of these methods has enriched the theoretical research system in the field of table tennis technical and tactical analysis to a certain extent. Moreover, some scholars have employed comprehensive evaluation methods to evaluate table tennis technical and tactical competitive performances. For example, methods like Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS), Rank Sum Ratio (RSR), and Grey Relational Analysis (GRA)20,77,78can effectively evaluate various technical and tactical indicators in table tennis matches. However, the application of a single comprehensive evaluation method can lead to inaccurate results due to non-parametric transformation and information loss issues79. The combination of both methods can achieve mutual advantages, avoiding the limitations of using a single method80. Hence, this study’s empirical research on athletes’ competitive performance in table tennis matches using principal component analysis combined with cluster analysis has been effectively validated. It can rank the overall competitive strength of each match and categorize matches with similar competitive strengths, to a certain extent, reflecting the comprehensive competitive level of athletes in each match. This combined approach has theoretical reference value and practical guidance for coaches to adjust match strategies, enhance athletes’ strength, and improve their performance. Additionally, this combined method is also worth being promoted and applied in other competitive sports.

Advantages and limitation of this study

This study holds several advantages in the field of table tennis technical and tactical analysis. Firstly, compared to traditional research methods, the PCA-CA approach effectively evaluates the overall competitive strength of each match and categorizes matches with similar competitive strengths. This helps mitigate the limitations of using a single evaluation method. Secondly, this study is the first to employ the “New Four-Phase Index Statistics Method,” which effectively addresses issues like data mismatch, complex statistical processes, and data omissions that are common in traditional statistical methods. The observational indicators selected in this technical and tactical statistical method, when applied within the PCA-CA framework, effectively reflect the performance levels of athletes in each match. However, this study also has some limitations, such as the limited number of match videos collected and the exclusion of female athletes’ competitive performances in this context. It does not emphasize the impact of athletes’ playing style, grip type, and psychological characteristics on matches. The study is primarily focused on quantitative analysis and does not combine qualitative research.

Conclusion

  1. (1)

    The “New Four-Phase Index” consisting of eight sub-observational indicators is suitable for application and evaluation within the PCA-CA comprehensive model for assessing table tennis technical efficiency. Compared to traditional technical and tactical statistical methods, the “New Four-Phase Index” demonstrates greater feasibility in this research’s match data statistics and analysis.

  2. (2)

    The eight technical and tactical indicators from the 40 matches can be extracted into four principal components based on the eigenvalue ≥ 1 criterion, reflecting 75.343% of the information in the original variables. The results of the principal component comprehensive ranking of the 40 matches are generally consistent with the match outcomes. However, there is a 5% probability of inconsistencies due to the phenomenon of “imbalance between total points and match results.”

  3. (3)

    Cluster analysis categorizes the comprehensive competitive strength of the 40 matches into three groups: “fluctuating competitive level,” “stable competitive level,” and “weaker competitive performance.” Variance analysis shows significant statistical differences between these three categories (P < 0.05).

  4. (4)

    The PCA-CA comprehensive analysis model applied to the empirical study of male table tennis players’ competitive performances in matches received a good validation. The combination of these two methods offers a complementary advantage, avoids the limitations of using a single method, and effectively reflects athletes’ comprehensive competitive levels in each match. This approach is helpful for coaches in understanding athletes’ competitive states and devising corresponding match strategies. The evaluation method can also be appropriately extended and applied to other competitive sports.