Machine learning driven clustering for silhouetting 5G network throughput

Ramesh, Parameswaran; Bhuvaneswari, P. T. V.

doi:10.1038/s41598-026-45902-6

Download PDF

Article
Open access
Published: 30 March 2026

Machine learning driven clustering for silhouetting 5G network throughput

Parameswaran Ramesh¹ &
P. T. V. Bhuvaneswari^1,2

Scientific Reports volume 16, Article number: 10583 (2026) Cite this article

790 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Compared with previous generations, the 5G enhanced mobile broadband (eMBB) application delivers higher connection, quicker data speeds, and better customer support. Improving data transmission speeds for 5G uplink user equipment (UE) users is the goal of this study. Python is used for data analysis and framework building. This research looks at a 250-m-radius Picocell Base Station (PBS) that can have 15 user equipment (UEs). The position of the user is determined by the cell-range Poisson distribution. The physical base station (PBS), which assesses the state of the signal transmission channel, receives channel state information (CSI) from user equipment (UE). Rayleigh, Rician, free space path, and long-distance route loss models are used in the study. A dataset of channel statuses is generated by the query. There is dynamism in the dataset. K-means clustering is used by UEs to handle service-specific needs. By integrating bandwidth, clustering improves system performance and maximizes the cumulative rate of all user equipment. Channel gain, transmission rate, and minimum service information rate are the characteristics that define UEs. After grouping, users in Cluster 3 had the highest cumulative rate of 9.52 Mbps and an average rate of 7.52 Mbps. In addition to increasing system capacity, bandwidth concatenation satisfied the service needs for every user’s equipment (UE). Performance criteria of several clustering models were evaluated, and K-means was found to be the best method. The method was methodically created to satisfy the goals of the study. This research investigates beamforming capabilities and adaptive clustering to improve user fairness and efficiency.

Direct constraint control for EM-based miniaturization of microwave passives

Article Open access 03 August 2022

Multi-dimensional flux balance analysis to optimized resources and energy efficiency in MEC aided 5G networks

Article Open access 23 August 2025

Machine-learning-aided method for optimizing beam selection and update period in 5G networks and beyond

Article Open access 29 August 2024

Introduction

The use of wireless communication has developed into a revolutionary force in the contemporary period, with each new generation of technology providing innovative capabilities that have had a dramatic impact on how we interact with the digital world. 5G wireless technology is the most recent standard for mobile connectivity. 5G leverages advancements from previous generations to enhance capacity, reduce latency, accelerate data transmission, and accommodate a significant number of interconnected devices¹. This state-of-the-art technology has the potential to transform transportation, healthcare, telecommunications, and entertainment industries. It may also enhance user experiences by enabling quicker downloads and continuous connection².

One of the biggest disadvantages of 5G technology is its restricted worldwide coverage, which mostly helps urban regions³. Rural areas may not have access to 5G technology for many years. 5G technologies provide high-speed download rates, with transfer speeds capped at 100 Mbps, which represents a significant improvement compared to 4G. Enhancements in mobile phone battery technology are necessary for the successful implementation of 5G. 5G enables several applications, such as massive Machine Type Communications (mMTC), enhanced Mobile Broadband (eMBB), and Ultra-Reliable Low Latency Communications (URLLC), which are widely used.

Sum rates refer to the highest possible data rates that may be accessed by all users within a given region simultaneously. This crucial factor impacts the capacity of the network and the overall experience of the user. 5G enhances the effectiveness of using available frequencies, manages interference effectively, and allocates resources in order to maximise the overall data transmission rate. Efficiently serving many consumers and providing diverse services with dependability requires high sum rates⁴. In 5G, sum rate accurately quantifies communications network information flow, especially in large user environments. Network capacity and throughput are both improved by raising the overall rate⁵. 5G networks can accommodate a larger number of users and provide faster data transmission by efficiently allocating resources to maximise the overall data rate. This optimisation also enhances quality of service (QoS) metrics such as latency and reliability. Sum rate optimisation guarantees the ability of a network to adjust and perform well in changing communication environments⁶. Nevertheless, the use of intricate optimisation techniques and sophisticated interference management technologies presents a formidable obstacle in attaining the highest possible data transmission rates in 5G networks. Sum rate optimization may not be stable or scalable in large networks because to its vulnerability to channel fluctuations and performance measure compromises. Sum rate optimization studies aim to improve 5G communication networks despite these challenges⁷.

5G eMBB offers faster data rates, connectivity, and satisfaction for users than previous generations. Achieving rates of many gigabits per second is the goal of eMBB. This accelerates the transmission of large files, high-definition films, and data-heavy apps. Virtual reality, augmented reality, and interactive multimedia all have data rate limitations⁸. It is truly remarkable how many people can access and transmit data at fast speeds on 5G networks. In order to maximize the utilization of the available spectrum, 5G plans to implement improvements such as beamforming, spectrum sharing, massive multiple-input multiple-output (MIMO) technology, extensive modulation methods, and more⁹. With better frequency spectrum use, the network capacity and data transfer rates may be increased, leading to eMBB services. There is a high demand for transmitting data at high speeds, thus contemporary communication systems need to maximise the total rates, which represent the amount of data that can be sent. The use of heuristics and rigid resource allocation algorithms may not effectively use dynamic and complex communication environments¹⁰.

Through data-driven insights, Machine Learning (ML) may improve communication system components by increasing data transfer speeds¹¹. Channel prediction is a particular area of expertise in machine learning. Machine learning models have the ability to forecast channel conditions based on past data. These models predict the decrease in signal strength, the variation in signal strength over time, and the disruption caused by other signals in order to optimise the transmission settings for a stable channel and maximum data transfer rates¹². The algorithms have the capability to dynamically allocate bandwidth, power, and time periods based on real-time feedback and optimisation objectives. Intelligent resource allocation enables machine learning-driven systems to optimise the total data transmission rates and guarantee fairness and efficiency in complex and diverse network environments¹³. Advanced machine learning techniques may enhance communication system total rates by optimising channel prediction, resource allocation, interference management, and modulation approaches¹⁴.

This research is motivated by the critical need to optimize uplink data transmission in 5G networks, focusing on enhancing the capabilities of Enhanced Mobile Broadband (eMBB). The investigation of a 100 to 250-m-radius center-excited Picocell Base Station (PBS) with 15 users provides a microcosm of real-world scenarios where users are densely packed and demand high-speed uplink capabilities. By leveraging advanced channel modeling techniques such as Rayleigh, Rician, free space path, and long-distance route loss models, to create a realistic and dynamic channel state dataset.

Literature survey

In¹⁵, the authors improved multicarrier NOMA sub channel and power allocation. The goal was to maximise data transfer while keeping user power gaps above a specific threshold. Sub channel allocation (SA) and power allocation (PA) comprised the optimisation process. Four algorithms were evaluated for SA, while a low-complexity technique was offered for PA. The study compared the suggested procedure to benchmarks. Best sum rate performance was achieved with the worst case first (WCF) algorithm with PA.

A multi-objective optimisation method for downlink transmission non-orthogonal multiple access (NOMA) networks maximize sum rate and user fairness¹⁶. The weighted sum method was used to convert a multi-objective optimisation issue to a single-objective one. Karush–Kuhn–Tucker (KKT) criterion and Lagrange dual decomposition solve the optimisation problem. The simulation shows that the NOMA network strategy maximises sum rate and user fairness. The study’s iterative power allocation (PA) algorithm converges quickly and is compared to benchmarks.

Network sum-rate maximisation was explored in network-coded clustered uplink non-orthogonal multiple-access (NOMA) networks with relays that can perform SWIPT¹⁷. An iterative three-layer method is devised to address the user and relay power allocation and power-splitting per relay sub-problem with low complexity. Simulation results show that the suggested approach efficiently maximises network sum-rate and produces near-optimal solutions to meet QoS restrictions. An offline resource allocation method using a statistical CSI-based Markov model to handle data arrival restrictions has also been examined.

In¹⁸, the authors explored sum rate maximisation and energy-efficient power allocation (PA) in a multi-cluster multi-user MIMO-NOMA system with numerous cluster users. Each user must meet a minimum rate to ensure service quality. To maximise energy-efficiency (EE), compare total transmit power to QoS requirements. We derive a mathematical expression for sum rate maximisation and use non-convex fractional programming to maximise EE. Simulations reveal that PA techniques beat OMA in EE and cumulative rate enhancement.

The authors examined the downlink of an OFDM-based Non-Orthogonal Multiple Access (NOMA) system as presented in¹⁹. To optimize the system’s weighted total rate, the researchers proposed co-channel user set selection, power distribution among multiplexed users on each sub-band, and power allocation across sub-bands. A greedy algorithm for user selection and an iterative sub-optimal algorithm for power distribution in each sub-band make up the two-step heuristic technique. Simulations show that the proposed methods and greedy user selection approach improve mean spectral efficiency by over 15%. The study examines how user number affects system performance.

Data-driven network performance trend detection using unsupervised machine learning and Conditional Independence Tests (CITs) was described in²⁰. Crowdsourcing data from 5G User Equipment (UEs) and an LTE network dataset were used to test the K-Means clustering algorithm. Assessments showed that uplink throughput had the biggest impact on performance patterns. The Physical Uplink Control Channel (PUCCH) signalling resources also affected the user equipment’s uplink data transfer rate in the LTE dataset. Future developments will use deep learning algorithms for time-series data analysis. A reinforcement learning-based scheduling method for full-duplex wireless networks optimises radio resource allocation by grouping users²¹. The algorithm does not estimate user-to-user channels. The study used reinforcement learning to simplify OFDMA wireless network scheduling and improve spectral efficiency. The method performed well with clustered user equipment. The technique ignores inter-cell interferences and focuses on single cells. Multiple cells and scheduling strategies will be studied in the future.

Reinforcement learning-based NOMA user clustering and power allocation algorithms were examined in²². Power allocation and data rate are optimised using Q-learning. K-means clusters users by channel gain and maximises data rate. Extensive simulations show that the Q-learning technique with user clustering achieves the highest cumulative data-rate. It can also overcome NOMA transmission power budget and minimum user data rate constraints. In²³, MIMO Power-Domain NOMA and Constrained Optimisation Power Allocation (COPA) were used to cluster downlink networks. The Sum Capacity, system Bit Error Rate (BER), and modulation methods of various clustering algorithms on a multi-user MIMO-NOMA platform are analysed. Least channel correlation (LCC) outperforms Random clustering (RAN), Best with worst (BWW), and Best with best (BWB) clustering. Clustering with a proper precoding technique enhances spectral efficiency and BER, making it an attractive wireless network communication method.

The authors²⁴ designed a deep learning-based MIMO-NOMA system to maximise sum data throughput and energy efficiency. Multiple convolutional layers make up the communication deep neural network (CDNN). It employs training algorithms to solve the power allocation problem, improving data capacity and energy efficiency. Deep learning’s representation and mapping capabilities help the complicated MIMO system attain accurate CSI and improved SIC. The next research focus is on time-varying fading and system capacity restrictions. In²⁵, the authors optimised power allocation and user capacity in wireless networks with sub channel assignment limitations. Multiple linear regression models and Shannon capacity formula were used to calculate channel data. Test data predicts sub-channel capacity and solves optimisation models. Constraints were linear regression equations and power and capacity were variables. Allocating network segments to users to forecast network performance was also studied to improve wireless networks.

ML algorithms were used to optimise QoS, signal traffic load, and energy efficiency in green cellular networks²⁶. CoMP, which improves network coverage and data rate in TE-Advanced networks, was also discussed. The study examined green cellular communication power efficiency under QoS limitations. It also details the electricity needed to transport information and examines how bandwidth limits affect latency. The paper also discussed cellular network energy-spectral efficiency trade-offs. The authors in²⁷ proposed utilising an artificial neural network (ANN) to cluster 5G NOMA system downlink users to maximise system throughput at minimum complexity. The ANN model is trained using NOMA user clusters, transmitting powers, and channel gains from a historical dataset. To avoid overfitting and accurately predict cluster formation, the model is validated to optimise hyper-parameters. The simulation findings of ANN-based user clustering framework outperformed orthogonal multiple access (OMA) approaches and improved throughput over Brute force while retaining clustering complexity.

Extreme learning machine-based user clustering²⁸ learns quickly. Using channel gains and powers, this NOMA-compatible method rapidly predicts user cluster formation. The ELM architecture is perfect for UC tuning because it can use vast amounts of input data to make predictions. In terms of performance, ELM-UC outperforms brute-force search (B-FS). Dynamic user clustering and ANN-UC are both surpassed by it. The ELM-UC scheme takes into account the lengthier learning period of UC schemes that rely on neural networks. It eliminates the requirement for laborious backpropagation learning by solving output weights in a single step. The authors developed a downlink multiuser NOMA system with diversity combining at each user in²⁹. To boost total capacity and channel performance, the system used SC, EGC, and MRC. The system outperforms standard NOMA, with MRC diversity performing best. Increased diversity linkages reduce system outage. High complexity is the best performance, but OMA with diversity schemes always has a lower outage.

Two-stage user selection with proportional fairness for downlink non-orthogonal multiple access (NOMA) with zero-forcing beamforming is suggested in³⁰. The algorithm prioritises short-term fairness by using Jain’s fairness index to treat powerful users fairly and improves the throughput-fairness trade-off. The simulations show that the algorithm improved the user fairness index by 38.96% over the NOMA system while maintaining the system sum-rate. Superior to standard NOMA systems, the approach provides remarkable short-term fairness at large data rates. In downlink multiuser MIMO cellular networks³¹, examined NOMA with successive interference cancellation (SIC). User equipment (UE) antenna clusters are dynamically formed to match or exceed base station (BS) transmit antenna clusters. Linear beamforming reduces inter-cluster interference in clusters with all receive antennas using the same beamforming vector. The system also distributed power dynamically to maximise cell capacity. The study also introduced a low-computational multi-cluster ZF-BF (Zero Forcing) user clustering scheme.

In³² recommended using ML in wireless communications, focusing on SARSA (State-Action-Reward-State-Action) reinforcement learning to predict optimal beam designs. Ray tracing was utilised to find the best beamforming angles to reduce transmitter interference and increase system capacity. SARSA, Q-Learning, and Double Q-Learning achieved 99% accuracy with two transmitters and receivers. The best method, double Q-learning, needed more computational time. The research shows that classic reinforcement learning can compete with exhaustive search in lower execution times.A reinforcement learning (RL) approach to power allocation in power-domain non-orthogonal multiple access (NOMA) was given in³³. To optimise system spectral efficiency, the Base Station (BS) distributes power to each user. Dynamic power allocation coefficient selection is suggested using the Actor-Critic algorithm architecture. The Actor component calculates the Expected Error (EE) via gradient descent, whereas the Critic component approximates the value function using a compatible function. Simulations show that the Actor-Critic algorithm can reliably predict NOMA power allocation coefficients. Continuous iterations could maximise spectrum efficiency.

The authors in the aforementioned literature have examined various system models, each accompanied by a distinct set of constraints. The previously discussed works are fundamentally based on improving the sum rate in 5G by means of the strategic application of user clustering, power optimization, beamforming, and various machine learning algorithms. Nevertheless, the research endeavors exhibit several constraints, including the focus on a singular fading environment and the lack of a precise delineation of the system scenario for its implementation in real-world contexts. The studies have failed to elucidate certain channel parameters, including the components of path loss, and have not adequately considered the specific service requirements of individual users.

The proposed method gets around these problems by creating a good system scenario with a pico-cell base station situated in a hexagonal cell that could be shown as a real-time setting. Several channel conditions and path loss models were looked into when the dataset was being made to make it more like real channel conditions for signal transmission. We studied several clustering model performance criteria to find the optimal approach. This research focuses on enhancing the sum rate in 5G through user clustering, while also addressing the service requirements of all eMBB users within the cell. This constitutes the novelty of the proposed work. In the era of rapid digital transformation, the demand for high-speed, reliable mobile communication is paramount. The advent of 5G technology promises to revolutionize the mobile broadband experience by significantly enhancing data speeds, network capacity, and user experiences. However, to fully realize the potential of 5G, particularly in uplink scenarios, innovative solutions are required to address the challenges of efficient data transmission, user fairness, and overall system performance.

System model

The picocell model consists of a base station and a fixed number of User Equipment (UE) on which the system model is developed. It is assumed that the uplink of all the partners is active and transmitting to the base station. Therefore, this equation computes the CSI using various fading channel models and the path loss equations. To determine Channel State Information (CSI), implementation of above-stated equation is necessary which considers the effect of both the Rayleigh and Rician fading channel models along with the elaborate calculations of free space and log-distance path loss parameter. The channel gain is calculated first It is subsequently determined the signal-to-interference-plus-noise ratio (SINR), which is an important measure of quality. The SINR values are then used to calculate a user channel capacity. Table 1 shows the data rate of each user channel with the system parameters.

Table 1 Mathematical model.

Full size table

Channel gain, SINR, capacity, and sum rate are closely linked in the system model. Channel gain, influenced by path loss and fading, affects SINR by determining the strength of the received signal relative to interference and noise³⁴. Higher SINR values lead to increased channel capacity based on Shannon’s capacity formula. Each user’s capacity, as measured by their SINR values, affects the aggregate data rate since it adds to the system’s total sum rate. By combining these separate capabilities, the model’s total rate calculation sheds light on the system’s overall performance under different channel circumstances. Improving the total rate and system efficiency as a whole may be accomplished with the help of this thorough examination.

System scenario

A base station, strategically positioned at the center of a hexagonal cell, provides service to fifteen users within a picocell framework. The distribution of user deployment surrounding the base station adheres to a Poisson model. The examination centers on the conveyance of information from users to the base station, scrutinizing two fading channels alongside path loss models. The simulation data is meticulously structured into a dataset that encapsulates performance indicators across a range of conditions. Figure 1 presents a visual representation of the base station interaction with users, elucidating the spatial configuration and functional dynamics of the picocell system.

A Euclidean distance computation is used to determine the distance between each user and the base station. This is done by positioning the hexagonal cell in a four-quadrant system. Each user is given two coordinates, which are X and Y.

Methodology

The configuration of the picocell system and the generation of a personal dataset are described. As a result, machine learning techniques are encouraged. It also emphasizes the challenges in model development and assesses the preparedness of clustering algorithms to satisfy various service requirements. This analysis will seek to establish the efficacy of the algorithm in enhancing system performance and ensuring effective delivery of services. Figure 2 shows the block diagram of the proposed system that will enhance the throughput through machine learning methods.

The dataset provides raw data for analysis and optimization, making it essential. Calculating channel parameters for each person accurately mimics real-world channel conditions. This large dataset supports analysis, enabling informed decision-making and optimization. CSI equations study open space, log-distance route loss, Rayleigh, and Rician fading channels. After assessing channel gain, the SINR, a key communication quality indicator, is calculated. The model calculates user channel capacity using SINR values, reflecting the maximum data rate under defined conditions. The system model links channel gain, SINR, capacity, and sum rate. Channel gain and SINR are impacted by path loss and fading. Signal intensity relative to interference and noise raises SINR and channel capacity per Shannon’s formula. SINR-determined user capacity affect system sum rate. Therefore, channel gain affects SINR, capacity, and aggregate data rate. The model determines the sum rate, aggregating capacity to evaluate system performance. This method gives a complete picture of system behaviour under different settings, facilitating analysis and improvement.

Channel capacity analysis

To achieve user expectations, channel capacity and service needs must be assessed. This research optimizes the system for high-quality communication by analysing channel capacity and matching it with service needs. Set defined performance benchmarks and optimization goals in this stage. Table 2 presents the minimum data rates mandated for the services evaluated under the eMBB application in accordance with 3GPP specifications. Each UE is allocated a specific service based on their demand and a minimum necessary data rate between 0.064 and 5 Mbps³⁵.

Table 2 Specifications of the services.

Full size table

Choosing the best machine learning technique for user clustering is essential for efficient resource allocation and system optimization; K-means clustering³⁶ is recommended over DBSCAN and GMM due to its ability to effectively classify users based on channel attributes, which enhances resource utilization and system performance. This choice facilitates model building. The project implements K-means, DB-scan, and GMM for user clustering to improve sum rate. Performance metrics like Silhouette score and Davies-Bouldin index justify K-means as the superior algorithm³⁷.

Model construction with CSI

Developing the model is essential to turning unprocessed data into useful insights. Cluster users, utilising CSI data, can target tuning by identifying channel condition patterns. This phase is essential for optimising system efficiency and sum rates. Figure 3 shows the working mechanism of the CSI model.

The principle of Euclidean distance is the basis of the k-means algorithm³⁸. If features follow a different scale will bias cluster formation. The chosen features already possess channel gain, transmission rate and minimum quality of service. This scaling guarantees that the overall functions will be in the range zero to one. Thus, it will avoid a parameter.

$$x^{\prime} = \frac{{x - x_{min} }}{{x_{max} - x_{min} }}$$

Sum rate enhancement evaluation

Assessing the sum rate prior to and following clustering serves to measure the efficacy of the optimization strategy. Through a comparison of system performance metrics, it figures out how user clustering affects overall throughput. This helps with planning future improvements. The satisfaction of users and the usability of systems are contingent upon criteria that are specific to the service in question. Our objective is to meet the service needs of each hexagonal-cell eMBB user comprehensively. Focusing on the allocation of system capacity to meet these objectives enhances user experience and underscores the practical and substantial impacts of research.

Results and discussion

Performance indicators from exhaustive simulations are also examined. This comprehensive examination compares system performance before and after clustering. Clustering approaches improve system performance, revealing benefits. A thorough evaluation of user clustering machine learning models is also presented.

Simulation parameters

Table 3 shows the simulation parameters of the obtained Free-space Rayleigh fading channel for the system scenario. For the specified system, the capacity and sum rates are calculated for all UEs in the system.

Table 3 Simulation parameters.

Full size table

Performance metrics

Table 4 shows the calculated channel gain, Sum rate of the users and then the SINR achieved throughout this simulation. In this study, the capacity and sum rate are calculated for the considered system for every UE in the system.

Table 4 Calculated channel gain, Sum Rate and SINR.

Full size table

Performance analysis

This section analyses performance indicators (sum rate, ML model assessment metrics) before and after clustering and compares three ML models to find the optimal user clustering model.

Analysis of user clustering

The user system capacity attained and the minimal capacity required providing the required throughput for meeting each UE’s service-specific needs are shown in Fig. 4. Nine of the fifteen customers were able to meet their minimal service needs, while the other users were unable to do so. Only 60% of the users were able to meet the standards. Figure 5 depicts the computation of throughput for 15 users, with user 8 attaining the highest of 3.45 Mbps and user 15 attaining 0.06 Mbps. The service demand distribution for the different services in the research under consideration is illustrated in Fig. 6.

Figure 7 illustrates the user equipment’s association with specific clusters following the clustering process, along with their respective coordinates within the picocell. Various shapes represent user entities that are categorized into three distinct clusters.

A distinct user cluster, categorized according to their individual service requirements and CSI, is depicted in Fig. 8. The height of each bar represents the aggregate data transfer rate, measured in Mbps, attained by users inside the corresponding cluster. The calculation of the total rate is obtained by applying Shannon’s Capacity Formula, which takes into consideration many aspects including bandwidth and SINR. As a result, clusters with higher sum-rate bars are more effective at providing the service that UEs in such clusters require. Furthermore, it is important to note that 7.52 Mbps is the computed average throughput for all clusters.

Figure 8 illustrates the Sum rate for multiple users in clusters 1, 2, and 3; it further affirms the robustness of the proposed CSI-driven clustering framework and the sum rate of the users in cluster 3 is maximum among all the clusters. This could be probably due to better channel condition or high resource allocation to satisfy their high QoS requirement. Cluster 1 and cluster 2 users’ performance is moderate and low respectively. The proposed model validates the effective grouping of users according to their channel conditions and service needs in order to foster increased system throughput.

Figure 8 represents the UE belonging to which cluster and the distance of the UE from the BS. Figure 9 shows the Calculation of Through after Clustering. Figure 10 shows the algorithm for the cluster pairing in this research and how it calculates using capacity of the user. Figure 11 shows the calculation of throughput after clustering.

When comparing the minimum needed throughput to the capacity attained post-clustering, it is clear from Fig. 12 that each user is able to meet their service needs. This makes sure that clustering increases the global system capacity, which raises the system’s throughput and meets each user’s service-specific needs at the same time.

ML model comparison

The various models of machine learning are tested for user clustering. Both the silhouette score and the Davies-Bouldin index have been used for evaluation. To cluster UEs into clusters we have evaluated various clustering models kmeans, DBSCAN and GMM or Gaussian Mixture Models. Clusters whose silhouette scores are higher suggest that the clusters are more distinct and also that there is better cohesion and separation between clusters. DBI compares the centers and diameters of the clusters to determine which cluster is better separated. The Davies-Bouldin index prefers lower values for better clustering. A thorough investigation comparing the two assessment measures across different clustering models indicates the advantages and disadvantages of each clustering technique. The results in both metrics highlighted in Fig. 13 indicate that the k-means model performed better than the other two models as it scored better.

Complexity analysis

Figure 14 shows the estimated execution time vs. the number of users of the hierarchical clustering and the CSI-driven K-means. The proposed method O (N) shows a linear increase in execution time, confirming the theoretical formulations of computational complexities demonstrated by the analysis. Execution time of subtractive clustering and hierarchical clustering O (N²) begins to outgrow at higher densities of users as they exhibit a quadratic growth trend. The execution time of both methods is low for small networks, but the time difference grows exponentially with network size. Our method has been found to be suitable for a real-time dense 5G Pico cell scenario owing to scalability.

The log-scale plot exhibits a striking difference in scalability between our proposed CSI-driven K-means framework and the hierarchical clustering is shown in the Fig. 15. At the log scale, the proposed approach values on the y-axis lay almost straight with a mild slope, which confirms linear growth, which means the computational complexity of algorithm is O (N). The hierarchical clustering has a more upward trend owing to its O (N2) complexity. As more users join, there is also divergence in the two curves, with the hierarchical clustering going slightly up faster. The log-size plot illustrates the asymptotic behaviours of the two approaches; the proposed approach further scales better and remains.

The cumulative system sum rate for data is shown in Fig. 16. The calculation of normalized clustering curves is done directly by the real UE sum rates. The non-normalized curve exhibits proportional degradation due to the misalignment of feature scales. The normalized clustering has a higher cumulative throughput for each iteration and finally shows a throughput of 21.54 Mbps as against 19.39 Mbps without normalization. The persistent widening of the gap supports our assertion that suitable feature scaling leads to more precise clustering and improved resource sharing, even in small-scale picocell deployment. An ablation study of the contribution of feature normalization was conducted by comparing the system performance in the K-means clustering stage with and without normalization, as given in Table 5.

Table 5 Effect of feature normalization.

Full size table

Table 6 displays a comparison of computational characteristics of different clustering methods. The clustering space and time requirements of the quadratic cost hierarchical clustering technique are not scalable in dense deployments of 5G. Moreover, the DBSCAN algorithm has a time complexity of O (n log n) but suffers performance degradation for large-scale deployments. The proposed K-means framework driven by CSI has a linear cost with clustering time-clustering space. This linear cost can provide excellent scalability and practically deploy ability for online uplink scheduling in 5G picocells.

Table 6 Comparison of complexity analysis.

Full size table

The K-means clustering is used essentially to improve the efficiency of the system. The information about the channel gain, data rate and minimum service required data rate of the users are useful for the bandwidth allocation. It improves the total sum rate of the system Users in cluster 3 are performing excellent with a cumulative rate of 9.52 Mbps and average rate of 7.82 Mbps. The study examines the application of beam-forming and adaptive clustering techniques to enhance equity. The G technology is on course to change mobile broadband experiences significantly. More smartphone users every day are burning through more data. Therefore, the enhanced mobile broadband technology is the response to this need. Various new applications will also be supported by it along with any services provided by fixed, mobile and industrial users.

Conclusion and future work

This study carefully examined four unique channel situations, each defined by different combinations of fading channels and route loss models, to comprehensively assess the system deployment scenario. A complete dataset was generated with 15 users, each with a distinctive set of channel settings and tailored service requirements. Beneficial results were achieved by the use of the K-means machine learning algorithm to user clustering. This approach significantly enhanced system efficiency, increased the total sum rate, and successfully met the service demands of users. The investigation will focus on dynamic user clustering processes, integrating real-time feedback on QoS to facilitate adaptive management of network resources. Through ongoing observation and analysis of real-time QoS metrics, the system is capable of dynamically re-clustering users to uphold optimal performance levels and guarantee that service requirements are consistently fulfilled. The advancements will enhance network performance and user satisfaction, facilitating the development of more efficient and user-focused 5G networks. The integration of advanced machine learning algorithms and adaptive clustering techniques marks a notable advancement in the pursuit of improved mobile broadband (eMBB) and the provision of high-quality service to all network users.

The use of advanced machine learning techniques to enhance system performance is an area that needs further investigation in the future. We will employ a wide range of methods to thoroughly compare their performance, including K-means clustering, ELMs, CNNs, and DNNs³⁹. This will enable us to pinpoint the most efficient strategies for enhancing resource distribution and user contentment. Future efforts will also investigate the implementation of beamforming systems designed to enhance signal delivery to individual users⁴⁰. Beamforming has the potential to improve the quality of the received signal by focusing transmission power on particular users and thereby enhancing their communication experience.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request and it has been available in this doi (https://doi.org/10.5281/zenodo.18830367).

References

Dangi, R., Lalwani, P., Choudhary, G., You, I. & Pau, G. Study and investigation on 5G technology: A systematic review.. Sensors 22, 26. https://doi.org/10.3390/s22010026 (2021).
Article PubMed PubMed Central ADS Google Scholar
Ramesh, P. et al. User association-based load balancing using reinforcement learning in 5G heterogeneous networks.. J. Supercomput. 81, 328. https://doi.org/10.1007/s11227-024-06788-1 (2025).
Article Google Scholar
Maulani, I. & Johansyah, C. The development of 5G technology and its implications for the industry. Devotion J. Commun. Service 4, 631–635. https://doi.org/10.36418/devotion.v4i2.416 (2023).
Article Google Scholar
Tian, X. et al. Sum rate maximization in multi-cell multi-user networks: An inverse reinforcement learning-based approach,. IEEE Wirel. Commun. Lett. 13(1), 4–8. https://doi.org/10.1109/LWC.2023.3292280 (2024).
Article Google Scholar
Liu, Y. & Chen, W. Capacity analysis and sum rate maximization for the SCMA cellular network coexisting with D2D communications,. China Commun. 19(10), 55–68. https://doi.org/10.23919/JCC.2022.10.004 (2022).
Article Google Scholar
Ramesh, P., Sai, B. K. & Bhuvaneswari, P. T. V. Impact of LDPC encoded data in uplink of 5G PD-NOMA system. In 2023 International Conference on Intelligent Technologies for Sustainable Electric and Communications Systems (iTech SECOM), Coimbatore, India 490–494 (2023). https://doi.org/10.1109/iTechSECOM59882.2023.10435278
She, C., Dong, R., Hardjawana, W., Li, Y. & Vucetic, B. Optimizing resource allocation for 5G services with diverse quality-of-service requirements. In 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA 1–6 (2019). https://doi.org/10.1109/GLOBECOM38437.2019.9014271
Ramesh, P., Bhuvaneswari, P. T. V., Ashok, R. S. & Veena, S. Machine learning-based throughput enhancement in fifth-generation networks. Bull. Pol. Acad. Sci. Tech. Sci. 6, e153426–e153426 (2025).
Google Scholar
H. Kim, Enhanced mobile broadband communication systems. In Design and Optimization for 5G Wireless Communications 239–302. (IEEE, 2020). https://doi.org/10.1002/9781119494492.ch7
Adnan, N. H. M., Rafiqul, I. M. & Alam, A. H. M. Z. Massive MIMO for fifth generation (5G): Opportunities and challenges. In 2016 International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur 47–52 (2016). https://doi.org/10.1109/ICCCE.2016.23
Adebusola, J. A., Ariyo, A. A., Elisha, O. A., Olubunmi, A. M. & Julius, O. O. An overview of 5G technology. In 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), Ayobo, Nigeria 1–4 (2020). https://doi.org/10.1109/ICMCECS47690.2020.240853
Karthick, K. Comprehensive overview of optimization techniques in machine learning training. Control Syst. Optim. Lett. https://doi.org/10.59247/csol.v2i1.69 (2024).
Article Google Scholar
Yuan, J., Ngo, H. Q. & Matthaiou, M. Machine learning-based channel prediction in massive MIMO with channel aging,. IEEE Trans. Wirel. Commun. 19(5), 2960–2973. https://doi.org/10.1109/TWC.2020.2969627 (2020).
Article Google Scholar
Joloudari, J. H., Alizadehsani, R., Nodehi, I., Mojrian, S., Sanaz, F., Shirkharkolaie, S. K., Dipu, H. M., San, T. R. & Acharya, U. R. Resource allocation optimization using artificial intelligence methods in various computing paradigms: A review (2022). https://doi.org/10.13140/RG.2.2.32857.39522
Xu, T., Zhao, M., Yao, X. & Zhu, Y. An improved communication resource allocation strategy for wireless networks based on deep reinforcement learning. Comput. Commun. 188, 90–98 (2022).
Article Google Scholar
Trankatwar, S. & Wali, P. Subchannel and power optimization for sum rate maximization in downlink multicarrier NOMA networks. Phys. Commun. 58, 102050 (2023).
Article Google Scholar
Trankatwar, S. & Wali, P. Power allocation scheme for sum rate and fairness trade-off in downlink NOMA networks. Comput. Commun. 221, 78–89 (2024).
Article Google Scholar
Baidas, M. W., Abdel Ghaffar, A. M. & Alsusa, E. Network sum-rate maximization for network-coded clustered uplink NOMA networks with SWIPT-enabled relays. Comput. Netw. 244, 110340 (2024).
Article Google Scholar
Zeng, M., Yadav, A., Dobre, O. A. & Poor, H. V. Energy-efficient power allocation for MIMO-NOMA with multiple users in a cluster. IEEE Access 6, 5170–5181. https://doi.org/10.1109/ACCESS.2017.2779855 (2018).
Article Google Scholar
Parida, P. & Das, S. S. Power allocation in OFDM based NOMA systems: A DC programming approach. In 2014 IEEE Globecom Workshops (GC Wkshps), Austin, TX, USA 1026–1031 (2014). https://doi.org/10.1109/GLOCOMW.2014.7063568
Sousa, M., Vieira, P., Queluz, M. P. & Rodrigues, A. Towards the use of unsupervised causal learning in wireless networks operation. J. King Saud Univ. Comput. Inf. Sci. 35(9), 101764 (2023).
Article Google Scholar
Fawaz, H., El Helou, M., Lahoud, S. & Khawam, K. A reinforcement learning approach to queue-aware scheduling in full-duplex wireless networks. Comput. Netw. 189, 107893 (2021).
Article Google Scholar
Rezwan, S., Shin, S. & Choi, W. Efficient user clustering and reinforcement learning based power allocation for NOMA systems. In 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Korea (South) 143–147 (2020). https://doi.org/10.1109/ICTC49870.2020.9289376
Jana, S. Proposition of a novel clustering methodology for multi-user MIMO-PD-NOMA. In 2023 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India 461–466 (2023). https://doi.org/10.1109/ICCCIS60361.2023.10425517
Huang, H. et al. Deep learning-based sum data rate and energy efficiency optimization for MIMO-NOMA systems,. IEEE Trans. Wirel. Commun. 19(8), 5373–5388. https://doi.org/10.1109/TWC.2020.2992786 (2020).
Article Google Scholar
Ramesh, P. et al. Software defined network architecture based network slicing in fifth generation networks. Inf. Midem-J. Microelectron. Electron. Compon. Mater. 54(2), 123–130 (2024).
Google Scholar
Saeed, M. M., Saeed, R. A., Azim, M. A., Ali, E. S., Mokhtar, R. A. & Khalifa, O. Green machine learning approach for QoS improvement in cellular communications. In 2022 IEEE 2nd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Sabratha, Libya 523–528 (2022). https://doi.org/10.1109/MI-STA54861.2022.9837585
Kumaresan, S. P., Tan, C. K. & Ng, Y. H. Efficient user clustering using a low-complexity artificial neural network (ANN) for 5G NOMA systems,. IEEE Access 8, 179307–179316. https://doi.org/10.1109/ACCESS.2020.3027777 (2020).
Article Google Scholar
Kumaresan, S. P., Tan, C. K. & Ng, Y. H. Extreme learning machine (ELM) for fast user clustering in downlink non-orthogonal multiple access (NOMA) 5G networks,. IEEE Access 9, 130884–130894. https://doi.org/10.1109/ACCESS.2021.3114619 (2021).
Article Google Scholar
Sharma, P., Kumar, A. & Bansal, M. Performance analysis of downlink NOMA system with diversity combining schemes over k-µ fading channel. In 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, India 1–5 (2022). https://doi.org/10.1109/CICT56698.2022.9997916
Al-Wani, M. M. et al. On short term fairness and throughput of user clustering for downlink non-orthogonal multiple access system. In 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia 1–6 (2019). https://doi.org/10.1109/VTCSpring.2019.8746330
Ramesh, P. et al. Efficient resource allocation for ultra reliable low latency communication delay minimization in fifth generation networks. Inf. Midem-J. Microelectron. Electron. Compon. Mater. 55(1), 37–45 (2025).
Google Scholar
Zhang, S. et al. A dynamic power allocation scheme in power-domain NOMA using actor-critic reinforcement learning. In 2018 IEEE/CIC International Conference on Communications in China (ICCC), Beijing, China 719–723 (2018). https://doi.org/10.1109/ICCChina.2018.8641248
Cai, L., Xu, G. & Niyato, D. Att-GRU: An attention-enhanced gated recurrent unit for channel prediction in deep space communications. IEEE Trans. Veh. Technol. https://doi.org/10.1109/TVT.2025.3637170 (2025).
Article Google Scholar
Ali, S., Hossain, E. & Kim, D. I. Non-orthogonal multiple access (NOMA) for downlink multiuser MIMO systems: User clustering, beamforming, and power allocation. IEEE Access 5, 565–577. https://doi.org/10.1109/ACCESS.2016.2646183 (2017).
Article Google Scholar
Cai, L., Xu, G., Zhang, Q., Song, Z. & Zhang, W. Deep learning based channel estimation for deep-space communications. IEEE Trans. Veh. Technol. 74(12), 19743–19748. https://doi.org/10.1109/TVT.2025.3586859 (2025).
Article Google Scholar
Savazzi, S., Nicoli, M. & Rampa, V. Federated learning with cooperating devices: A consensus approach for massive IoT networks. IEEE Internet Things J. 7(5), 4641–4654. https://doi.org/10.1109/JIOT.2020.2964162 (2020).
Article Google Scholar
Ramesh, P., Sandhiya, S. & Sattainathan, S. Silhouette analysis based K-means clustering in 5G heterogenous network. In 2023 International Conference on Intelligent Technologies for Sustainable Electric and Communications Systems (iTech SECOM), Coimbatore, India 541–545 (2023). https://doi.org/10.1109/iTechSECOM59882.2023.10435234
Zeng, M., Luo, Y., Jiang, H. & Wang, Y. A joint cluster formation scheme with multilayer awareness for energy-harvesting supported D2D multicast communication. IEEE Trans. Wirel. Commun. 21(9), 7595–7608. https://doi.org/10.1109/TWC.2022.3159838 (2022).
Article Google Scholar
Tan, K. & Zhu, C. Multi-agent deep reinforcement learning for vehicular resource allocation: A comparison study of different agent cooperation levels. IEEE Trans. Veh. Technol. 74(12), 18900–18913. https://doi.org/10.1109/TVT.2025.3582801 (2025).
Article Google Scholar

Download references

Funding

No funding received for this research.

Author information

Authors and Affiliations

Department of Electronics Engineering, Madras Institute of Technology, Anna University, Chennai, Tamil Nadu, India
Parameswaran Ramesh & P. T. V. Bhuvaneswari
Centre for Internet of Things, Madras Institute of Technology, Anna University, Chennai, Tamil Nadu, India
P. T. V. Bhuvaneswari

Authors

Parameswaran Ramesh
View author publications
Search author on:PubMed Google Scholar
P. T. V. Bhuvaneswari
View author publications
Search author on:PubMed Google Scholar

Contributions

The research resulted from a collective effort, with all authors contributing collaboratively to its accomplishment.

Corresponding author

Correspondence to Parameswaran Ramesh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ramesh, P., Bhuvaneswari, P.T.V. Machine learning driven clustering for silhouetting 5G network throughput. Sci Rep 16, 10583 (2026). https://doi.org/10.1038/s41598-026-45902-6

Download citation

Received: 06 October 2025
Accepted: 23 March 2026
Published: 30 March 2026
Version of record: 31 March 2026
DOI: https://doi.org/10.1038/s41598-026-45902-6