Exploring the effects and potential of unlocked I/O-powered single board computer clusters

Lee, Yeongmo; Park, Dongchul

doi:10.1038/s41598-025-34623-x

Download PDF

Article
Open access
Published: 07 January 2026

Exploring the effects and potential of unlocked I/O-powered single board computer clusters

Yeongmo Lee¹ &
Dongchul Park²

Scientific Reports volume 16, Article number: 4486 (2026) Cite this article

588 Accesses
Metrics details

Subjects

Abstract

Across all fields, experts strive to collect and analyze numerous data to extract meaningful insight. In response to this trend, Hadoop and Spark have emerged, and many organizations have adopted these platforms for big data storage and processing. In addition, data centers with powerful servers are constantly expanding to accommodate the increasing number of data, causing significant costs and environmental problems due to the tremendous energy consumption. Single board computer (SBC) clusters have emerged as a promising alternative for efficient computing. Most SBCs have adopted a microSD slot for data storage; thus effectively processing massive data has some limitations. However, the latest generation Raspberry Pi (RPi), model 5B provides a peripheral component interconnect express (PCIe) interface, enabling high-performance storage media, such as solid state drives (SSD). This paper extensively investigates the practicability and potential of SBCs for terabyte-scale big data processing. We build the SBC Hadoop cluster, adopting the most powerful, latest RPi 5B (8 GB of RAM) with a fast PCIe-based SSD via the PCIe interface, and perform six widely known benchmarks with a large (up to 2 TB) data size. Furthermore, this paper discusses challenges and suggestions, including the effects of input/output (I/O) throughput, central processing unit (CPU) overclocking, power supply, and trim command, which significantly affect SBC Hadoop performance. This comprehensive study concludes that integrating the enhanced computing of RPi 5B with unlocked I/O performance finally paves the way for a practical solution to real-world big data processing on SBC clusters.

Enhanced secure storage and data privacy management system for big data based on multilayer model

Article Open access 02 September 2025

A future for digital public goods for monitoring SDG indicators

Article Open access 07 December 2023

Scrutinizing transport phenomena and recombination mechanisms in thin film Sb₂S₃ solar cells

Article Open access 30 May 2024

Introduction

In contemporary society, the significance of big data has increased due to its significant potential across sectors, including business, health care, finance, and government^1,2,3. This exponential data growth requires efficient data processing and analysis. Apache Hadoop and Spark platforms have emerged to address these challenges, revolutionizing the market by providing scalable and efficient solutions for big data storage and processing⁴. Their advent marks a pivotal development in the field of big data analytics. Hadoop, using the Hadoop Distributed File System (HDFS) and MapReduce processing framework, offers a scalable and fault-tolerant solution for managing massive datasets. Spark provides an in-memory processing framework to accelerate data computation and has become the preferred choice for real-time analytics, stream processing, and machine learning applications⁵.

The rapidly increasing scale of data increases the need for efficient storage and processing capabilities to manage massive volumes of data. Cloud service providers, such as Amazon Web Service (AWS) and Microsoft Azure have been expanding their global data center infrastructure to reduce cloud costs⁶. Data centers consume an enormous amount of electricity to operate and cool their systems. Today, geographical features, such as a cold climate, are commonly employed as solutions to reduce data center heat generation and power consumption⁷. However, this approach is impractical for most enterprises (other than big tech companies). Introducing servers with high power efficiency for big data processing could provide a more direct solution. From this perspective, a single board computer (SBC) offers significant advantages in terms of power efficiency, making it a promising solution to financial and environmental challenges.

The SBC has significantly evolved since its inception in the 1970s, with the market experiencing substantial growth following the launch of the Raspberry Pi (RPi) controller in 2012⁸. Initially aimed at promoting computer science education, RPi has transcended its original purpose to become a versatile tool embraced by hobbyists, educators, and professionals. With each new iteration, the capabilities of RPi have been enhanced, offering more memory, better processing power, and improved connectivity. Researchers have begun investigating using RPi clusters in more challenging applications, such as big data processing and micro data centers⁴. Moreover, RPi has consistently relied on a micro secure digital (microSD) card for storage media. Due to the low performance and small capacity of microSDs, processing big data at practical, useful speeds has been challenging⁹.

With the release of the latest generation RPi (i.e., RPi 5B) in late 2023, the peripheral component interconnect express (PCIe) interface was finally adopted, providing an option to mount modern high-performance storage media other than the slow microSD. Thus far, the central processing unit (CPU) has been upgraded to a quad-core processor running at speeds of up to 2.4 GHz, the random access memory (RAM) has increased to 8 GB, and the Ethernet speed has improved to 1 Gbps.

Research on the potential use of SBCs for big data processing has been steadily conducted. However, conventional SBCs have had limitations in terms of integration with other hardware. To mitigate structural constraints on storage performance, some studies have adopted Network Attached Storage (NAS) or Universal Flash Storage (UFS) cards, but higher-performance storage media are still required for practical big data processing. With the introduction of a PCIe interface, the RPi 5B can be regarded as the first SBC model to effectively break free from storage media constraints. In this study, the concrete performance of big data processing in an SBC environment combined with an SSD is presented, and the potential of SBCs for big data processing is demonstrated.

This study investigates the feasibility of processing big data using the SBC cluster by natively integrating an M.2 solid state drive (SSD) into the latest generation RPi (i.e., RPi 5B with 8 GB of RAM) via a PCIe 3.0 ($\times$1) interface by adopting a new hardware-attached-on-top (HAT) board. This hardware configuration of RPi is currently the most powerful for big data processing. First, this study extensively measures the performance of RPi 5B individually to investigate its computational capability enhancement compared to the previous generation of RPi (i.e., RPi 4B) released in 2019. Then, an SBC Hadoop cluster is built, consisting of one RPi 5B as a master node and eight RPi 5B units as worker nodes, and a series of representative benchmarks, including WordCount, TeraGen and TeraSort, Pi computation, Grep, and TestDFSIO, are used to evaluate the performance of RPi Hadoop and Spark clusters. In addition, for a more objective evaluation, the RPi 5B cluster is compared with the newest generation of desktop computer, providing informative insight into the possibility of real-world big data processing using SBC Hadoop clusters.

Furthermore, this study examines the storage media effect on the big data processing platforms by comparing the differences between the fastest microSD and PCIe-based SSD storage options available for the RPi 5B. Last, a discussion and the findings are presented, covering CPU overclocking (2.4 to 3.0 GHz) to discuss the influence of CPU performance, the PCIe bandwidth (PCIe 2.0 to 3.0) to study the influence of storage bandwidth, the trim command to explore the garbage collection problem of flash memory, the power supply to verify the correlation between the power supply amount and RPi performance, and multiple application execution to evaluate the parallel computing capability.

To our knowledge, this study is quite comprehensive, covering several problems, findings, and suggestions for the feasibility of SBCs in real-world big data processing by adopting the most powerful RPi configurations. This study concludes that the latest significantly improved computing capability of RPi 5B with the fastest modern SSD storage media offers a practical solution for small, terabyte-scale big data processing on SBC Hadoop clusters. Unlike previous RPis, the data size is no longer a limitation because the RPi 5B Hadoop cluster effectively and stably expands processing and storage capabilities. Overall, the PCIe interface of RPi 5B is extremely beneficial.

The main contributions of this paper are as follows:

Big data processing on a powerful SBC without an I/O Bottleneck The existing SBC-based big data processing has limitations due to the performance of the SBC and the storage media constraints. The latest M.2 SSDs (500 GB) are directly connected to each SBC node via the PCIe interface, rather than other slower I/O interfaces, such as a universal serial bus (USB) port to overcome the bottleneck in SBC-based big data processing. Thus, each SBC node improves storage performance by an average of 7$\times$ faster for reading and 16$\times$ faster for writing compared to one of the fastest microSD cards on the market. This study is the first to configure the strongest RPi 5B node to explore the possibility of big data processing on the SBC Hadoop cluster comprehensively using an terabyte-scale dataset size (up to 2 TB; Section Raspberry Pi cluster performance).
Challenges and predicting the future Most existing studies have primarily aimed to measure the experimental performance of each SBC Hadoop cluster because the previous SBC nodes did not have sufficient computing capabilities. Thus, these studies have focused on running Hadoop benchmarks. This study extensively discusses diverse challenges and findings that significantly affect SBC node performance. For example, a higher I/O throughput (PCIe 2.0 vs. 3.0) could not be fully utilized by RPi 5B due to the unexpected CPU bottleneck, even though its CPU performance had substantially improved, which was verified by the CPU overclocking experiments (2.4 vs. 3.0 GHz). This finding implies that the I/O throughput expansion of future SBCs must consider CPU performance improvement accordingly. In addition to this suggestion, several challenges and experiments offer insight into the potential performance of future SBC models (Section Discussion).
Extensive scale-Out evaluation We evaluate six well-known benchmarks for Hadoop and Spark. Each benchmark is conducted by varying the node count from one to eight with various data sizes. These comprehensive performance evaluations offer insight into several factors, such as the hardware improvement effects and performance trends. We also measure the diverse performance metrics (CPU, network, storage media performance, and power consumption) of individual RPi models, 4B and 5B, investigating the relationship between the performance improvement of the individual node and the cluster. In the comparison of the performance of the SBC Hadoop cluster to that of the desktop computer, the Hadoop performance offers insight into the future possibilities and opportunities for the SBC Hadoop cluster for real-world big data processing (Sections Individual Raspberry Pi performance and Raspberry Pi cluster performance).

The remainder of this paper is organized as follows. Section Background knowledge provides an overview of RPi and SSD. Next, Section Related work presents the related studies. Then, Sections Individual Raspberry Pi performance and Raspberry Pi cluster performance present a variety of experimental results and analyses and Section Discussion discusses the diverse challenges. Finally, Section Conclusion concludes the work.

Background knowledge

Raspberry Pi

The RPi is a series of SBCs developed by the RPi Foundation to promote the study of computer science¹⁰. Designed to make computing affordable and available to everyone, the RPi quickly gained popularity and has transformed into a powerful and versatile platform used in schools, businesses, and projects worldwide. The first model, RPi Model B (RPi B), was released in February 2012, and since then, it has significantly evolved. The RPi B features a 700 MHz single-core ARM processor with 256 MB of RAM, targeting educational use to promote computer science. Launched in 2015, RPi 2B offers a quad-core processor and 1 GB of RAM. Starting in 2016, RPi 3B introduced built-in Wi-Fi and Bluetooth. In 2019, RPi 4B included substantial upgrades, initially including up to 4 GB of RAM (which was recently upgraded to 8 GB), a USB 3.0, and dual 4K display support¹¹.

The fifth and latest generation in the series, RPi 5B was again significantly upgraded over its predecessor. It featured a new Broadcom BCM2712 system-on-a-chip with four ARM Cortex-A76 cores clocked at 2.4 GHz, offering two times faster computing performance than its predecessor, the RPi 4B. In addition, RPi 5B was enhanced with a VideoCore VII graphics processing unit (GPU) for better graphics and introduced a new RP1 chip for improved I/O handling. A PCIe interface provides expanded customization potential, such as a non-volatile memory express (NVMe) SSD and 10 Gb for networking. The RPi 5B maintains its legacy of offering high performance at an affordable price. Each iteration has built on the success of its predecessors, cementing the reputation of RPi as a powerful and versatile platform for a wide range of applications¹².

Storage connection interface

Recently, SSDs have rapidly gained popularity due to their improved performance, reliability, and price and have gradually been widely adopted in personal computers and servers^13,14. Early SSDs were connected to the motherboard using the serial advanced technology attachment (SATA) interface, the same interface used by traditional hard disk drive (HDD)¹⁵.

However, the SATA interface was initially designed for conventional HDDs, making it difficult to exploit the SSD speed fully. An NVMe SSD connected to the PCIe interface has emerged to eliminate this bottleneck¹⁶. The PCIe interface resolves the limitations of conventional interfaces in terms of bandwidth and scalability. The number of lanes ($\times$1, $\times$4, $\times$8, $\times$16, etc.) allows bandwidth adjustments as needed. Although the SATA III SSD is limited to a maximum transfer rate of 6 GT/s, the PCIe 3.0 $\times$4 NVMe SSD offers transfer rates of up to 32 GT/s, and the PCIe 4.0 $\times$4 NVMe SSD offers transfer rates of up to 64 GT/s¹⁷. The PCI Special Interest Group recently released the PCIe 7.0 specification (up to 128 GT/s per lane), and plenty of room exists to improve SSD performance.

The RPi 5B officially supports the PCIe interface. Although the PCIe connection of the RPI 5B is certified for PCIe 2.0 speeds (5 GT/s), PCIe 3.0 (8 GT/s) speeds can be forced by configuring the PCIe option. The previous generation of RPi (i.e., RPi 4B) was able to connect an SSD indirectly using a USB 3.0 port. However, the RPi 5B is the first generation to provide a native PCIe interface on the board, allowing a direct PCIe SSD connection. The USB 3.0 port, with a maximum bandwidth of 5 Gbps, limits the ability to apply the SSD performance fully, whereas PCIe allows scalability depending on the version and number of lanes. With the continuous advancement of big data and artificial intelligence (AI), the SBC must increasingly handle high bandwidths. Thus, the next generation RPi is expected to upgrade the PCIe version or increase the lane count to improve SSD employment.

Related work

Many researchers have studied low-powered SBC clusters in diverse fields, such as edge computing¹⁸, cloud¹⁹, database²⁰, blockchain²¹, AI²², and cryptography²³. Among the many papers on SBC clusters, this section focuses on SBC-based big data processing because it is directly associated with this research.

Adnan et al.⁹ built a cluster using Banana Pi M3 (octa-core 2 GHz CPU with 2 GB of RAM and a 1 Gbps Ethernet) and evaluated the big data processing performance with two storage types: microSD and NAS. They aimed to address the limitations of the microSD as the primary storage media for the SBC and proposed the NAS as an alternative. They evaluated the performance of Hadoop using the TeraSort benchmark for single and multinode configurations. Although the performance difference between the two storage media options was just 2 seconds for a single node and 7 seconds for multiple nodes, they suggested the NAS due to its lower I/O wait time. However, they adopted a small data size (up to 2 GB for a single node and up to 4 GB for multiple nodes) for big data and focused on the primary storage limitations of SBCs.

Qureshi and Koubaa²⁴ explored the energy efficiency of SBC clusters for big data applications. They built an ARM-based RPi 2B cluster and an Odroid XU-4 cluster and evaluated the performance of these clusters. Their study claimed that SBC-based clusters are generally energy efficient, whereas the cost-to-performance ratio depends on the workload. For a smaller workload, the XU-4 cluster, comprising 20 Odroid XU-4 in a cluster, is more cost-effective and power-efficient than the RPi cluster. For high-intensity tasks, such as TeraGen and TeraSort, the XU-4 cluster consumes significantly more energy. They concluded that the RPi cluster consistently underperformed on all benchmarks. However, they employed an older generation RPi (i.e., RPi 2B) with a 900 MHz quad-core ARM Cortex-A7 CPU and 1 GB of RAM, which did not have sufficient computing capabilities to run big data applications.

Lee et al.²⁵ conducted an in-depth investigation into the challenges and potential of big data processing using an RPi 4B cluster. They constructed a five-node RPi 4B (quad-core 1.5 GHz CPU with 4 GB of RAM) cluster. This study focused on the effect of storage media performance in the RPi cluster using three portable storage media cards with various performance characteristics: a typical microSD, the fastest microSD, and UFS cards. The study claimed that faster storage media significantly improve SBC cluster performance, demonstrating a 1.3$\times$ to 7.07$\times$ performance improvement. They concluded that the RPi 4B cluster exhibited the potential to process actual big data.

Unlike these studies, this study aims to explore the possibility of real-world big data processing on the most powerful RPi Hadoop clusters by adopting terabyte-scale big data (up to 2 TB) and the most powerful RPi Hadoop cluster. Moreover, a variety of challenges and extreme experiments are provided for informative insight.

Individual Raspberry Pi performance

Hardware configurations

Raspberry Pi: 4B vs. 5B

Table 1 Specifications for Raspberry Pi models 4B and 5B.

Subjects

Abstract

Similar content being viewed by others

Enhanced secure storage and data privacy management system for big data based on multilayer model

A future for digital public goods for monitoring SDG indicators

Scrutinizing transport phenomena and recombination mechanisms in thin film Sb2S3 solar cells

Introduction

Background knowledge

Raspberry Pi

Storage connection interface

Related work

Individual Raspberry Pi performance

Hardware configurations

Raspberry Pi: 4B vs. 5B

Storage media: Fastest MicroSD vs. PCIe-based SSD

Peripheral components and accessories

Power consumption

CPU performance

Network performance

Storage media performance

Raspberry Pi cluster performance

Experimental setup

Cluster configurations

Hadoop and spark configurations

Input data and method

Experimental results and analyses

WordCount

TeraGen and TeraSort

Grep

Pi computation

TestDFSIO

Raspberry Pi 4B Cluster vs. 5B Cluster

RPi 5B cluster vs. Desktop computer

Discussion

PCIe Version: 2.0 vs. 3.0

TRIM command

CPU overclocking

Power Supply

Massive dataset and parallel processing

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Scrutinizing transport phenomena and recombination mechanisms in thin film Sb₂S₃ solar cells