Introduction

Multipliers are one of the most important components of computer science. They are often used in image processing, scientific computing, digital signal processors, computer graphics, and other areas1. A lot of multimedia and signal or image processing programs don’t need very accurate or relatively accurate calculations2. Approximate compressors are used to make circuits that are meant to work quickly work better since they can handle errors and make output that people can understand. High-speed multimedia applications have opened up a new area for high-speed fault-tolerant circuits that use approximation computation. These apps aren’t particularly precise, even though they work quickly. Less latency, less energy utilization, and a simpler system architecture are some of the other benefits3. To multiply, you must do these three things in this order: The first step is to make some of the product, the second step is to make less of the product, and the last step is to add more of the product. The dadda algorithm is a parallel structure multiplier that works much faster than array multipliers like Booth, Braun, Baugh-Wooley, and others. However, putting it into hardware requires more gates and uses more energy4.

The digit reduction phase of an n×n multiplier takes the digits that were made and turns them into a two-digit number. In its multi-stage implementation5, the carry-save adder performs this by using an array of adders or compressors. This step is the most important and useful one for reducing delay, energy use, and area in multipliers. It also uses a lot of compressors to lower the digit produced in the second step of flexible high-speed error programs. Many other types of compressors, such as 3:2, 4:2, and 5:2, have been used a lot to lower the bit rate.

TSV-based ultra-compact branch coupler, which is important for high-speed chip integration, although mostly in the RF and circuit packaging fields6. Negative differential resistance in 2D heterostructures, which can be effective in the development of nanoelectronic devices7. Interface ASIC for MEMS gyroscope designed with high accuracy and stability, and is an example of efficient hardware architectures8. Design of an accelerometer-based PUF for IoT security, which demonstrates the importance of innovation in lightweight and optimized circuit architectures9.

Researchers have recently identified compressors with a high number of inputs, such as 5:2 and 7:2 compressors. In our earlier study, we implemented an 8 × 8 multiplier using 5:2 compressors10. The majority of studies on multi-inlet compressors have focused on improving the circuit architecture for low-power or high-speed uses. However, the prior work’s multiplier design includes a lot of stages to minimize the numerical value and a high error rate, and the compressor design is very complicated. Power consumption and latency are two issues that multipliers with a larger number of bits are confronting. So, two new approximate multipliers are introduced in this article as a continuation of the prior work. Their architecture utilizes the design of two 4:2 and 7:2 compressors that use fewer transistors. This approach makes use of current-mode-control compressors, which boost the multiplier circuit’s speed via pass transistors, transform current into voltage, and alter the logical level of the output voltage. current mode design of logic circuits results in a very good reduction of output latency. As a result, the design of the compressors that make up the multiplier circuit’s main component will greatly aid in enhancing the circuit’s efficiency in the current mode. However, the main issue with the current mode logic circuit architecture is the comparatively large power losses in consumption. We have utilized pass transistors and current control logic in this essay to overcome this problem. The pass transistor technique yields circuits with the fewest transistors when compared to other techniques. However, the output voltage level designed for logic 0 and 1 in these circuits lacks an appropriate threshold, which is the source of the problem. It’s noteworthy to note that we were able to fix this issue by integrating current mode control with voltage conversion and logic level correction for the inverter-based output voltage. Following the creation of several compressors in this study, two novel architectures for various 8 × 8 and 16 × 16 multiplications are suggested. The main innovation of this design is that it uses suggested architectures with 4:2 and 7:2 compressors to minimize the transfer number in fewer stages. We made an effort to optimize the amount of circuits and compressor stages in these designs. The performance of the multiplier in the multiplication of two 8-bit and 16-bit images has been examined using Matlab simulation. The circuit construction, power consumption, and delay in the suggested circuits have all been examined using HSPICE software and 32 nm CNTFET technology. The following constitutes this article’s primary design contribution to the suggested multiplier:

  • Making use of the pass transistor logic and inverter-based voltage level correction in current mode control design techniques to decrease the number of transistors in various compressor circuits, thereby lowering power consumption and delay.

  • Considering the noise margin significance in error-resistant applications and its ability to capitalize on a high noise margin, the 7:2 compressor is incorporated into the plan of the more accurate approximate multiplier.

  • To improve the accuracy of the suggested approximation multiplier, the least error-prone approximate compressors are developed while balancing low latency and power consumption.

  • Development of a new multiplication method for 8 × 8 and 16 × 16 that includes steps to decrease the number of digits and compares the MATLAB software architecture for an imaging application with previous works regarding quality.

  • Examining the transistor count, speed, and power consumption obtained from the recommended compressor design.

The subsequent sections of this essay are organized as follows: Sect. 2 provides a concise summary of the background of pertinent articles and foundational concepts. Section 3 elucidates the proposed methodologies and protocols for delineating the compressor design and the recommended multiplier technique. The article’s findings are delineated in the fifth section, following the execution of simulations and the analysis of outcomes in the fourth section.

Research background

Related works

Technology is advancing more quickly these days. Specifically, based on their intended function and personal preferences, electronic gadget use is growing in a multitude of ways. Because they make it possible for us to carry out intricate mathematical operations involving big numbers more quickly, multipliers are essential in this sense. Multiplication enables us to complete operations in one step and in the quickest amount of time, which is advantageous in today’s environment, as opposed to executing a series of addition or subtraction operations11. Multiplications are not only more efficient, but they also have real-world uses in areas like cryptography, computer science, and engineering. For instance, they are employed in digital circuit design and data encryption and decryption. For rapid execution of complicated computations, multiplication is a crucial tool in mathematics and its applications12. To achieve high-speed beats, compressors are required. Prior research has shown that designing compressors with full adders yields correct results. This is followed by the introduction of approximately 4:2 and 5:2 compressors, which reduce ADP by 52% and latency by 18%. Improving multiplication speed is the primary goal of the present effort, which is primarily focused on developing a compressor-based 7:2 approximation. When contrasted with the 4:2 and 5:2 compressor-based approximation factors, the suggested technique is anticipated to offer the greatest improvement in speed while simultaneously reducing area, delay, and power usage. Here we took a look back at the efforts put into designing compressors and approximate multipliers13.

Approximation has emerged as a promising paradigm for power-efficient and high-speed computational circuits in fault-tolerant applications such as image processing and neural networks. Early work on approximation multipliers focused on CMOS-based designs and introduced techniques such as truncation and partial product puncturing to reduce power and area at the expense of accuracy. For example14, proposed 4:2 approximation compressors with simplified truth tables that yielded up to 50% power savings, but with a higher error rate compared to the proposed design. Similarly15, developed low-power approximation multipliers using error recovery modules, but their CMOS-based approach resulted in higher latencies and power compared to the proposed CNTFET-based compressors. These works, although effective in CMOS, lack the advantages of emerging technologies such as CNTFETs that offer high carrier mobility and ballistic transfer for superior performance. Recent advances in CNTFET-based approximation multipliers have exploited the unique properties of this technology, such as ballistic transition and n/p-type symmetric characteristics, to increase efficiency. In16, it was shown that the near-ballistic transition of CNTFETs allows high current conduction at low voltages (e.g., 0.9 V) while maintaining speed while reducing power consumption, a feature that the proposed design with a unit current of 100 nA in CML for very low power takes advantage of17 proposed a voltage overscaling (VOS)-based Dadda multiplier in CNTFETs that achieves significant power reduction (up to 60%) but with limited error resolution for high input scenarios, in contrast to the proposed design’s optimized error limiting to low-significant bits. In18, we introduced multi-column imprecise compressors for 8-bit multipliers that reduce the power by 30% compared to our 43.2 ps for 7:2, but with a higher error rate (MRED 50 ps) compared to our 43.2 ps for 7:2. In contrast to these works, our approach integrates CML with pass transistor logic to minimize the number of transistors and optimize the PDP, while maintaining high accuracy. The research gap is the lack of CNTFET-based approximate compressors that simultaneously achieve low PDP, minimal transistor count, and robust error control for multi-input scenarios, which the proposed design addresses through a novel combination of CML, pass transistor logic, and application-specific error optimization.

To minimize area, delay, and power while maintaining possible accuracy, two approximate compressors were designed and analyzed in the article. The size, power, and latency of the proposed 4:2 approximate compressor in19 are all reduced by 56.80%, 57.20%, and 73.30%, respectively, as compared to the precise 4:2 compressor.

The suggested approach synthesizes approximate multipliers for a variety of operand lengths from a 40 nm library. An approximate 15:4 compressor design based on a 5:3 compressor is presented in the study14. For reduced power consumption and high passage speed, the 15:4 compressor makes use of four distinct kinds of about 5:3 compressors. In order to replicate a 16-bit multiplication, the proposed 15:4 compressor is used. The simulation findings reveal that the power improvement above the exact compressor multipliers of 15:4 is substantial when using the multipliers with the suggested approximate compressors. The suggested multipliers outperform existing approximate multipliers in terms of passing rate. Lastly, the suggested multiplier finds usage in Image processing algorithms that assess the image’s peak signal-to-noise ratio (PSNR). In comparison to current architectures, the suggested approximate compressor in20 uses less power, has a shorter latency, and takes up less space. Consequently, a 5:2 compressor is recommended after analyzing two distinct 4:2 compressors. The recommended compressors are used to implement the 8 × 8 and 16 × 16 multipliers. More research is being done on the method utilized in fault-tolerant applications such as picture multiplication and smoothing.

An approach for current routing in SST submodules that helps increase the reliability of power circuits21. A self-packaged flyback converter with an embedded transformer that demonstrates the optimization of industrial electronic circuits22. A 4D chaotic system with dynamic analysis and control circuit implementation that can inspire the design of nonlinear circuits23. Synchronization of neurons with memristive membranes that highlights the importance of nonlinear and energy-efficient mechanisms for modern circuits24.

Recent research has shown that, when it comes to designing multiplication algorithms for various uses, every article is focusing on one thing: finding the sweet spot between power consumption, latency, and accuracy. Consequently, in this article, we will learn how to use the pass transistor technique to reduce design complications in multipliers’ circuit structures, control the current mode, modify the logic level of the output voltage based on an inverter, and perform 8 × 8 and 16 × 16 multiplication. Its application was based on the 7:2 and 4:2 compressor ratios. Additionally, the creation of multiplier architectures with the fewest stages of transfer digit reduction was utilized to create new, very accurate architectures that reduced the error and level of path transistors to an acceptable level using CNTFET technology.

Rationale for CNTFET technology in proposed design

The proposed 4:2 and 7:2 approximation compressors are designed using carbon nanotube field-effect transistor technology due to its superior electrical properties, which provide significant improvements in power efficiency, speed, and robustness for low-power and fault-tolerant applications such as image processing multipliers. CNTFETs offer high carrier mobility and low threshold voltages, which enable the use of low unit current in current-mode logic (CML) to achieve very low power consumption while maintaining high speed compared to CMOS-based designs. The mechanical correlation properties of carbon nanotubes, such as uniform chirality and diameter control, ensure a constant current in the transistors, reduce the variability in the algebraic sum of the CML current, and enable precise error control in the approximation pattern. Furthermore, the symmetrical electrical characteristics of n- and p-type CNTFETs, with nearly identical I-V curves due to the bandgap symmetry25, facilitate balanced differential pair operation in CML, increase noise margin, and minimize short-circuit currents, which are critical for the proposed pass transistor-based topology. The choice of CNTFET technology also takes into account process-specific variations, which are effectively managed to ensure design reliability. Unlike CMOS, where process variations significantly affect threshold and leakage voltages, the low sensitivity of CNTFETs to diameter variations results in minimal performance degradation, with latency and power variations of less than 9%24. The proposed design takes advantage of these features by optimizing transistor size and inverter-based voltage correction to reduce subthreshold swing variations, which are less than those of approximate CMOS-based designs2. Furthermore, the high on/off current ratio of the CNTFET (~ 10^5) supports the low-power performance required for multimedia applications, enabling a 58% power reduction compared to10 and a 94% lower PDP than14. By exploiting these unique features of the CNTFET, the proposed compressors achieve a superior balance between accuracy and efficiency, justifying the technology choice over traditional CMOS or other emerging technologies for the intended application domain.

A concise overview of CNTFET

Carbon nanotubes’ semiconducting and electrically conductive properties make them perfect for use in the production of transistors. Researchers use single-walled nanotubes to construct transistors because these materials outperform multi-walled ones in several important ways, including conductivity, bandwidth, reliability (i.e., the likelihood that the nanotube will be conductive or semiconducting), and controllability. Figure 1 shows example structures of these transistors. The two most important properties of these transistors are the threshold voltage and the diameter of the nanotube, which is both, governed by Eqs. 1 and 2.

$$\:{\varvec{D}}_{\varvec{C}\varvec{N}\varvec{T}}=\frac{\sqrt{3{\varvec{a}}_{0}}}{\varvec{\pi\:}}\sqrt{{\varvec{n}}^{2}+{\varvec{m}}^{2}+\varvec{n}\varvec{m}}\:\:\:\:\:\:\:\:\:$$
(1)
$$\:{\varvec{V}}_{\varvec{t}\varvec{h}}=\frac{\sqrt{3{\varvec{a}}_{0}}}{3\varvec{e}\varvec{d}}{\varvec{\nu\:}}_{\varvec{\pi\:}}\:\:\:\:\:\:\:\:\:$$
(2)

Owing to the utilization of the chiral vector in the aforementioned relations is set to (m,0), where d is the carbon nanotube’s diameter, e is the electron’s electric charge, and a0 = 0.142 nm is the atomic distance between each carbon atom22.

Fig. 1
figure 1

3D view of CNTFET.

Compressors

Compressors are one of the most crucial components in the construction of multipliers, as previously stated. For normal multipliers, 4:2 compressors and 5:2 compressors are the most crucial, whereas larger or more precise multipliers require 7:2 compressors. Compressors are also built using inverters that have variable voltage thresholds and the current-to-voltage conversion method, as mentioned in this article and in references [23 and 24]. The aforementioned items are examined in this section.

Compressor 4:2

Addition is often accomplished by 4:2 compressors. Five inputs and three outputs make up a precise 4:2 compressor, as seen in Fig. 2a. The Sum output8 has the same value as the four inputs, x1, x2, x3, x4, and cin. By using identical modules from the previous level as its input, a precision 4:2 compressor produces an output with a higher value for the subsequent stage. Figure 2b shows how to link two accumulators in series to create a 4:2 compressor. The logic of the 4:2 compressor is represented by Eq. (3), where ci is the transfer digit that enters the compressor from the preceding step with a reduced value and xi is the adder’s primary input26.

$$\:{\varvec{x}}_{1}+{\varvec{x}}_{2}+{\varvec{x}}_{3}+{\varvec{x}}_{4}+{\varvec{c}}_{\varvec{i}\varvec{n}}=\varvec{s}\varvec{u}\varvec{m}+2.\left(\varvec{c}\varvec{a}\varvec{r}\varvec{r}\varvec{y}+{\varvec{c}}_{\varvec{o}\varvec{u}\varvec{t}}\right)\:\:\:\:\:\:\:\:\:\:$$
(3)
Fig. 2
figure 2

(a) The exact 4:2 compressor illustrated (b) An analysis of a 4:2 precise compressor sample based on logic26.

Compressor 7:2

Figure 3 displays a generic block schematic of 7:2 compressors. Figure 3-b depicts a 7:2 compressor with an all-adder architecture. Seven inputs (A1, A2, A3, A4, A5, A6, A7) and two carriers from the preceding stage (Cin1, Cin2) make up the compressor. Its four outputs are an SUM and five carry. The combination of a 4:2 compressor and full adder composed leads in a delay of 0.52 ns and a power usage of roughly 0.264 microwatts for the 7:2 compressor, which is higher than for the 4:2 compressor. A three-stage accelerator for polynomial multiplication that optimizes computational power and latency by utilizing an efficient memory architecture; this is directly aligned with the goals of approximating circuit design and hardware optimization27. A deep belief network for magnetic encoder error compensation that increases measurement accuracy28. Differential neural networks are used to learn and predict current in a PMSM motor drive, demonstrating that computational methods can improve the performance of complex systems with high accuracy and lower cost29. A hybrid random number decoding method that prevents bit-width stacking and increases fault tolerance of digital circuits30. Approximate compressors have been introduced to decrease the delay. The following equation represents the logical structure of a 7:2 compressor.

$$\:{\varvec{x}}_{1}+{\varvec{x}}_{2}+{\varvec{x}}_{3}+{\varvec{x}}_{4}+{\varvec{x}}_{5}+{\varvec{x}}_{6}+{\varvec{x}}_{7}+{\varvec{c}}_{\varvec{i}\varvec{n}1}+{\varvec{c}}_{\varvec{i}\varvec{n}2}+{\varvec{c}}_{\varvec{i}\varvec{n}3}+{\varvec{c}}_{\varvec{i}\varvec{n}4}\:=\:\varvec{S}\varvec{u}\varvec{m}+2\times\:(\varvec{C}\varvec{a}\varvec{r}\varvec{r}\varvec{y}+{\varvec{c}}_{\varvec{o}\varvec{u}\varvec{t}1}+{\varvec{c}}_{\varvec{o}\varvec{u}\varvec{t}2}+{\varvec{c}}_{\varvec{o}\varvec{u}\varvec{t}3}+{\varvec{c}}_{\varvec{o}\varvec{u}\varvec{t}4})\:\:\:\:\:\:\:\:\:\:$$
(4)

A 7 to 2 compressor, like a 4 to 2 compressor, can also be created using adders, which can be seen in Fig. 3-b.

Fig. 3
figure 3

(a) An illustration of a precise compressor 7:2. (b) An analysis of a 7:2 precise compressor sample based on logic1.

Logic circuit with pass transistors

The term “pass-transistor logic” (PTL) in electronics refers to a number of logic families that are employed in integrated circuit design. The number of transistors needed to create various logic gates is decreased by getting rid of unnecessary transistors. Rather than acting as switches that are directly coupled to supply voltages, transistors are employed as switches to transfer logic levels between circuit nodes29. The output of every series transistor is less saturated than the input30. To restore the signal voltage to its full value when multiple devices are connected in series within a logic route, a typically designed gate could be needed. Conventional CMOS logic, on the other hand, switches the transistors such that one of the power supply rails is connected to the output (similar to an open-collector design), so the logic voltage levels do not drop in a sequential chain. To guarantee correct functioning, circuit simulations could be necessary.

A high-stability permanent magnet-based surface velometer is introduced, which improves the accuracy of measurement systems31. A novel circuit for sharpening the edge of bipolar pulses using diode reverse recovery, which is important in high-speed applications32. Analysis and design of an RCD circuit for suppressing oscillations in a Marx pulse generator, which increases the stability and reliability of high-power pulse systems33.

Electronic switches known as MOSFET pass transistors can be configured to either turn on or off their drain-source path based on the gate voltage signal. Since pass transistors lack level restoration and have a small non-zero resistance in their conduction path, the RC delay to charge the next logic stage’s input capacitance which includes both the gate capacitance and the parasitic capacitor is increasing as the voltage levels become logic-high or logic-low. As a result, we must address two issues in order to build these circuits: delay and proper logic voltage level. By using a voltage level correction system based on an inverter circuit, we hope to solve the level restoration problem in this study. Additionally, we are able to effectively manage the impacts of the parasitic capacitor and decrease the delay by utilizing an active load in a current to voltage level conversion system.

CML common mode logic method

A current unit in CML is responsible for implementing the logical value “1”. In this article, 100 nA is used as the unit of current. Figure 4 presents the fundamental methodology for CML circuits. The linear summation is accomplished by connecting the input currents together. As a result, multi-level voltages and currents proportionate to the input signal values will exist33.

There are various unit currents for this purpose, but the justification for choosing a unit current of 100 nA in CML can be stated as follows.

In CML logic, a unit current is used to represent a logical value of “1”, and the input currents are summed together to create multiple current/voltage levels. The selection of a small unit current such as 100 nA is mainly done to achieve a balance between low power consumption, high speed, and reduced circuit complexity in emerging technologies such as CNTFET. The main reasons are: 1- Reduced power consumption: Small currents in CML lead to reduced static power, since the power consumption is proportional to the unit current (P ≈ Vdd × I_unit). In this paper, the main objective is to design approximate compressors for low-power applications such as image processing in energy-constrained multimedia systems. By choosing 100 nA, the power consumption of the 4:2 compressor is reduced to 0.164 µW and the 7:2 compressor to 0.873 µW. 2 - Compatibility with CNTFET technology: CNTFETs allow operation at nanoampere currents due to their high carrier mobility and high drive current at low currents. The Stanford CNTFET model (used in the HSPICE simulation) shows that the nanotube diameter and threshold voltage allow handling of small currents without significant voltage drop. The current of 100 nA was chosen based on previous studies in CNTFET, such as21, which used similar currents for hybrid compressors and reported low power. 3- Speed ​​and noise optimization: Smaller current improves noise margin. Furthermore, in CML, the circuit speed (delay) is proportional to 1/I_unit (because the charge/discharge time of parasitic capacitors depends on the current). But in CNTFET, the low channel resistance (due to the quantum properties of nanotubes) allows even with a current of 100 nA, low delays (such as 21.4 ps for 4:2) to be achieved, which is consistent with our simulations.

Fig. 4
figure 4

CML technique using an inverter for threshold detection33.

Current mode control logic explanation

Current mode logic (CML), often referred to as current-mode control logic in this context, is a differential digital logic family that operates by switching current levels rather than voltage levels to represent binary states. Unlike traditional voltage-mode logic (e.g., CMOS), CML uses a constant current source to bias the circuit, where logical “1” is represented by a unit current (100 nA in our design) and “0” by its absence, enabling high-speed operation with reduced voltage swings and improved noise margins9,33. The primary motivation for integrating CML in our proposed compressors is to address the high power consumption inherent in conventional current-mode designs, which stems from static current flow. By employing controlled current summation and active loads for current-to-voltage conversion, CML minimizes dynamic power while maintaining fast switching, making it suitable for energy-constrained applications like image multipliers. In Sect. 1, this logic is highlighted as a solution to power issues, where current control techniques such as precise unit current selection and inverter-based threshold detection reduce overall dissipation by limiting current paths and optimizing logical levels, achieving up to 58% power savings compared to prior works10.

In the proposed approximate compressors, CML integrates seamlessly with pass-transistor logic to enhance functionality. For the 4:2 compressor, inputs are converted to currents, summed algebraically at shared nodes, and then transformed back to voltage outputs via inverters for level correction, ensuring full-swing signals despite pass-transistor voltage drops30. This control mechanism regulates current flow to prevent excessive leakage, addressing the power overhead by confining current to active paths only. Similarly, the 7:2 compressor cascades 4:2 and 3:2 blocks, where CML’s current summation handles multi-input compression with minimal delay (43.2 ps), while the approximation pattern introduces controlled errors that are mitigated by the logic’s high noise tolerance. The current mode control thus operates by dynamically adjusting current levels based on input combinations, reducing static power through low unit currents and improving speed via parallel current processing, as validated in simulations.

Overall, this integration of current mode control logic with the approximate compressors provides a balanced trade-off between power, delay, and accuracy, outperforming traditional designs in PDP. By focusing current control on error-tolerant paths, the proposed architecture preserves perceptual quality in image applications (Sect. 22), extending the brief mention in Sect. 1 with detailed implementation here. Future refinements could explore adaptive current biasing for further power optimization under PVT variations.

Proposed approximate compressor and multiplier design

Chip surface area and power consumption are significantly impacted by the intermediate connections made between circuits and gates. Due to the special properties of the current mode, fewer connections can be made, which lowers the chip’s surface area and accelerates the circuit by summing the currents algebraically. Using the suggested CML design, compressors operating in flow mode were suggested in this work. Since the design of 4:2 and 7:2 approximation compressors relies on the approximation error and the truth table of the approximation mode, these details are presented first. Next, each predicted compressor’s logical circuit form and circuit function are displayed at the gate level. The approximations described in this article are defined and designed at the transistor level using CML technology, which is based on carbon nanotubes. HSPICE simulation is used to examine whether basic gates and precision compressors are operating correctly in current mode. After developing the compressor architecture, the architecture of the suggested approximation multiplier is explained in this part. In comparison to the exact multiplier, it is evident that the outputs of this multiplier architecture are fairly accurate, despite the fact that the 4:2 and 7:2 compressor parts used in this article’s recommended multiplier are all approximate.

Proposed 4:2 and 7:2 compressor design

Full adder design as a 3:2 compressor

Table 1 defines the truth logic of a full adder. The proposed logic was designed for all error-free input modes. The final logic along with the circuit under CNTFET technology and at the gate level is shown in Fig. 5. Relations 1 and 2 describe the logic model of a full adder for the outputs of the circuit.

$$\:\varvec{S}\varvec{u}\varvec{m}\:=\:\varvec{X}0\:\varvec{x}\varvec{o}\varvec{r}\:\left(\:\varvec{X}1\:\varvec{x}\varvec{o}\varvec{r}\:\varvec{X}2\right)\:;\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$$
(5)
$$\:\varvec{C}\varvec{a}\varvec{r}\varvec{r}\varvec{y}\:=\:\varvec{X}0.\varvec{X}1\:+\left(\varvec{X}0\varvec{x}\varvec{o}\varvec{r}\varvec{X}1\right).\varvec{X}2;\:\:$$
(6)
Table 1 Truth table for the full adder.
Fig. 5
figure 5figure 5

(a) Karnaugh map for extracting logical relationships, (b) The logic circuit of the full adder at the gate level, (c) Full adder logic circuit at the transistor level.

Approximate 4:2 compressor design

This section proposes the design of an approximate 4:2 compressor using current logic conversion and pass transistors. Using the suggested compressor computation formulas and modifying the gate connection scheme to use the fewest transistors possible, the plans are shown. The truth table related to the proposed approximate 4:2 compressor is stated in Ref10. The table defines a single error. Using the Karnaugh model to simplify the truth table, two approximate compressor outputs are well-defined on the basis of relations 7 and 8:

$$\:\varvec{S}\varvec{u}\varvec{m}\:=\:\left(\:\varvec{X}1\varvec{x}\varvec{o}\varvec{r}\varvec{X}2\right)\varvec{x}\varvec{o}\varvec{r}\:\left(\varvec{X}3\varvec{x}\varvec{o}\varvec{r}\varvec{X}0\right)+\varvec{X}1.\varvec{X}2.\varvec{X}3.\varvec{X}0;$$
(7)
$$\:\varvec{C}\varvec{a}\varvec{r}\varvec{r}\varvec{y}\:=\:\varvec{X}0.\varvec{X}1\:+\varvec{X}2.\varvec{X}3\:+\varvec{X}1.\varvec{X}3\:+\varvec{X}0.\varvec{X}3\:+\varvec{X}1.\varvec{X}2\:+\varvec{X}0.\varvec{X}2;\:$$
(8)
Fig. 6
figure 6figure 6

(a) Karnaugh table for deriving logical relationships. (b) Suggested approximate 4:2 compressor at the gate level. (c) Implementation of 4:2 approximation compressors at transistor level: basic circuit based on pass transistor and current mode conversion.

Figure 6 shows the structure of the planned 4:2 compressor circuit. Figure 6a illustrates the approximate design of the suggested compressor with the help of the Karnaugh table. The gate level architecture of the 4:2 circuit compressor is depicted in Fig. 6b. The circuit output simulation results can be seen in Fig. 7. Truth table in ref10 provides evidence for the accuracy of the compressor simulation findings. The simulation results show that several inputs have been used to apply voltage pulses. Figure 7 shows the total output voltages that were determined by adding the carry and sum currents.

Fig. 7
figure 7

Simulation outcomes of the approximate 4:2 compressor based on the proposed technique.

Voltage levels for carbon nanotube inverters in the suggested circuit designs are defined with the goal of adjusting the circuit’s voltage. It has been tried to ensure that every router complies with this level of the state so that the desired logic can be obtained well.

Suggested approximate 7:2 compressor

In this work, to design the approximate 7:2 compressor, we have used a combined circuit consisting of 4:2 compressors and 3:2 compressors according to Fig. 8. In order to create the general structure of 7:2 compressors, the circuit outputs of both compressors are therefore created and modeled in accordance with the figure as the inputs of a complementary logic circuit with truth Table 2. The suggested approximate circuit for large input values has a higher error rate and contains mistakes of 1, 2, 3, and 4 units, as shown in Table 2. On the other hand, this error rate is zero for small input values. The governing equations of the integrated circuit in the 7:2 approximation compressors have been derived in this section utilizing the relations 9 and 10, with the Karnaugh model in Fig. 9a serving as the basis. The suggested approximate compressor was made simpler by using the features of the suggested logic circuit design procedures, which involved simulating a circuit design with pass transistors first and confirming the accuracy of the circuit’s operation. The approximation compressor based on CML is implemented at the gate and transistor levels, respectively, as shown in Fig. 9b and c. Figure 10 also displays the compressor simulation results. Table 2 provides the truth table for the suggested integrated logic circuit model of the roughly 7:2 compressor.

Fig. 8
figure 8

Structure display of 7:2 approximate compressor block diagram.

$$\:{\varvec{S}}_{\varvec{F}}\:=\:\varvec{X}0.\varvec{X}1\:+\:\varvec{X}2.\varvec{X}3\:+\varvec{X}0.\varvec{X}3\:\:+\:\varvec{X}0.\varvec{X}2\:+\varvec{X}1.\varvec{X}2+\:\:\varvec{X}1\:\varvec{x}\varvec{o}\varvec{r}\:\varvec{X}3;$$
(9)
$$\:{\varvec{C}}_{\varvec{F}}\:=\:\varvec{X}0\:+\:\varvec{X}2\:+\varvec{X}1.\varvec{X}3;$$
(10)
Table 2 The truth table of the integrated logic circuit of the proposed 7:2 approximate compressor.
Fig. 9
figure 9figure 9

(a) Transistor-level implementation of the approximate 7:2 compressor based on the proposed techniques with the Karnaugh table. (b) Proposed approximate 7:2 compressor gate level design. (c) Complementary logic circuit.

Fig. 10
figure 10

Simulation results of the proposed approximate 7:2 compressor.

Suggested multipliers

We will use the given approximate compressors to develop the proposed 16-bit and 8-bit approximate Dadda multiplier circuits in order to analyze their effects in the following. Comparing these proposed approximation multipliers to the recently designed instances, there are two main advantages. The first benefit of this approach is that, with the use of 4:2 and 7:2 compressors in the various 16-bit and 8-bit Dadda multipliers described below, the steps involved in decreasing the numerical digits are made simpler and to a minimum. The second benefit is applied in the suggested architectures in such a way that distinct methods are applied in the low-value and high-value column sections. In other words, we used compressor circuits with less approximation for the high-value column part and greater approximation for the low-value column section in the suggested topologies. Additionally, in the low-significant and high-significant columns of these multipliers, the fewest compressors should be employed. Figures 11 and 12 depict the suggested 8 × 8 and 16 × 16 approximate multiplier structures, respectively. For the 8-bit multiplier in the initial design, 5 approximate 7:2 compressors and 7 approximate 4:2 compressors were employed. Accurate full adders and half adders were used for the remaining components. The goal of this architecture is to reduce hardware complexity, and it accomplishes this goal by lowering carry digits in two quick steps. In the second design, accurate full adders and half adders are utilized for the remaining portions, while 28 approximate 7:2 and 28 approximate 4:2 compressors are used for the 16-bit multiplier. Reducing the hardware complexity is another goal of this architecture, and it can be accomplished in three easy steps by lowering the carry digits. Zero bit values are used for valuable bits in 7:2 compressors to use the utmost precision possible, as seen in Figs. 11 and 12 with red circles. Combining 7:2 and 4:2 compressors can aid in lowering power losses and multiplier output response delays. The second stage of the multiplier’s number of variables has been significantly lowered in the suggested method thanks to the suggested compressors. Due to the reduction in design complexity, fewer compressors are required in the second stage of the multiplier. This design lowers the hardware cost and the number of multiplier transistors that are suggested. Alternatively, we can obtain the lowest number of transistors in the suggested architecture by using the circuit design approach of compressors with pass transistors and the technique of adjusting the output voltage level by inverter.

Fig. 11
figure 11

A suggested 8-bit Dadda approximation multiplier construction using 4:2 and 7:2 compressors.

Fig. 12
figure 12

A design for a 16-bit Dadda approximation multiplier structure using 4:2 and 7:2 compressors.

Design Motivation, Functionality, and comparison of the proposed approximate compressors

The primary motivation for designing the proposed approximate 4:2 and 7:2 compressors lies in addressing the limitations of exact multipliers in energy-constrained multimedia applications, such as image processing, where full precision is often unnecessary but high speed and low power are critical. Traditional exact compressors, while providing zero error, incur significant overhead in transistor count, power dissipation, and delay due to complex logic gates and carry propagation chains1,2. In contrast, approximate computing allows for controlled inaccuracies to achieve substantial reductions in hardware complexity, making it ideal for error-tolerant tasks. The specific approximation pattern chosen for our designs prioritizes minimal errors for low-input values while allowing bounded errors (1 to 4 units) for higher inputs, as these occur less frequently in partial product reduction stages of Dadda multipliers and have negligible impact on perceptual accuracy. This pattern is inspired by probability-based error distributions in prior works11,18, where approximations are biased toward low-significance bits to balance accuracy with efficiency. By integrating current-mode logic (CML) with pass-transistor techniques, the designs aim to reduce static power and transistor count (31 for 4:2 and 73 for 7:2), while enhancing speed through algebraic current summation and inverter-based voltage level correction to mitigate signal degradation in pass-transistor chains8,29. This approach extends our previous work10 by incorporating higher-input compressors (7:2) for fewer reduction stages in larger multipliers (8 × 8 and 16 × 16), targeting a PDP reduction of over 50% compared to exact designs.

Functionally, the proposed 4:2 compressor processes four inputs (x1 to x4) plus a carry-in (cin) by converting logical levels to currents in CML, summing them linearly at internal nodes, and converting back to voltage outputs (Sum and Carry) using active loads and inverters. The approximation is derived from a simplified truth Table10, where one error mode (out of 16) is introduced to eliminate redundant gates, resulting in logic equations that prioritize XOR-like operations for speed. Simulation results demonstrate correct voltage pulses for most input combinations, with full-swing outputs ensured by the inverter threshold adjustment, preventing the voltage drop issues common in pass-transistor logic30. The 7:2 compressor extends this by cascading a 4:2 and 3:2 (full adder) block, handling seven inputs plus two carries to produce Sum, Carry, and additional outputs with error distances optimized for low values. This modular functionality reduces vertical critical paths in multipliers, as currents are summed without traditional gate delays, achieving a delay of 21.4 ps for 4:2 and 43.2 ps for 7:2 at 1 GHz. Analysis of the approximation pattern reveals a mean error distance (MED) of 0.0625 for 4:2 (1/16 error rate) and 0.281 for 7:2 (36/128), which is lower than many prior designs due to error confinement to less significant columns, improving overall multiplier accuracy. PVT variations further confirm robustness, with subthreshold swing deviations under 10% impacting delay minimally.

In comparison to previous approximate compressor designs, our approach offers significant advancements in efficiency and accuracy. For instance, the 4:2 compressor in10 (our prior work) uses 42 transistors with a PDP of 0.097 ns × µW and similar error rate (1/16), but lacks CML integration, leading to higher power (0.394 µW) and delay (24.7 ps); our design reduces transistors by 26%, power by 58%, and PDP by 96% through pass-transistor simplification and current-mode summation. Similarly, Marimuthu et al.14 propose a 15:4 compressor based on approximate 5:3 units, achieving power savings but with higher delay (31.03 ps) and no explicit error analysis for image applications; our 7:2 design outperforms in multi-input handling, reducing multiplier stages from three to two for 16 × 16 bits and yielding better PSNR (48.523 dB vs. their implied lower values).

Discussion and evaluation of results

Criteria for performance

This section offers some performance indicators that have been looked at and assessed to determine how effective approximation multipliers and compressors are. Implementation efficiency and accuracy measurements are the two general categories under which performance measures fall.

Accuracy criteria

The accuracy of the designed multiplications using the proposed compressors and the existing approximation compressors is evaluated using accuracy criteria34.

Error Distance (ED).

ED represents the difference between the output of the exact 4:2 compressor and the output of the approximate 4:2 compressor.

ED = Exactout – Approxout (11).

Mean Error Distance (MED).

The average ED for all potential input combinations is referred to as MED.

$$\:MED\:=\:\frac{1}{{2}^{2N}}\:\sum\:_{i=1}^{{2}^{2N}}\left|{ED}_{i}\right|\:\:\:\:\:\:\:\:\:\:\left(12\right)$$

Mean Relative Error Distance (MRED).

When S is the product of the precise compressors, MRED stands for the average ED in the consistent exact case for all conceivable combinations of input.

$$\:MRED\:=\:\frac{1}{{2}^{2N}}\:\sum\:_{i=1}^{{2}^{2N}}\frac{\left|{ED}_{i}\right|}{{S}_{i}}\:\:\:\:\:\:\:\:\:\:\left(13\right)$$

Normal Error Distance (NED).

NED calculates the average ED for every conceivable combination of inputs while accounting for the maximum error in the suggested scheme.

$$\:NED\:=\frac{MED}{D}=\frac{MED}{{({2}^{N\:}-1)}^{2}}\:=\:\frac{1}{{({2}^{N\:}-1)}^{2}}\:\:\sum\:_{i=1}^{{2}^{2N}}\frac{\left|{ED}_{i}\right|}{{S}_{i}}\:\:\:\:\:\:\:\:\:\:\left(14\right)$$

Accurate Output Count (AOC).

The exact number of outputs for every feasible combination of inputs is measured by AOC.

Criteria for executive efficiency

The suggested compressors employ 32 nm CNTFET technology, featuring an operating frequency of 1 GHz and a bias voltage of 0.9 V. The operational efficiency criteria, namely transistor count, latency, power consumption, and PDP, comprehensively assess the proposed hardware architecture. Reducing the number of transistors allows for a more compact design. The time it takes for a circuit to perform as intended is known as latency and is used to discuss power consumption and maximum running speed. Parameters like transistor count, latency, and power should be optimized in a perfect design such that they provide a wide range of equivalent accuracy.

Operational measures

Applications used to evaluate the efficiency of approximate and exact multiplications included picture fusion, edge recognition, and image sharpening. During the picture compositing process, the pixels in two 8-bit input images are multiplied, resulting in 8-bit outputs. Two input photos are merged into one output image. A measure of the output image quality called the mean structural similarity index (MSSIM) and the peak signal-to-noise ratio (PSNR) were conducted35.

The formula for PSNR is as follows:

$$\:PSNR\:=10log\left(\frac{{MAXI}^{2}}{MSE}\right)\:\:\:\:\:\:\:\:\:\:\left(15\right)$$

In an 8-bit grayscale image, the supreme rate in the full image is 255, with words I and K stand for the pixels of the detailed picture and the output image, correspondingly. This is the definition of MES:

$$\:MSE\:=\:\frac{1}{mn}\:\sum\:_{i=0}^{m-1}\sum\:_{j=0}^{n-1}{\left[I\left(i,j\right)-K\left(i,j\right)\right]}^{2}\:\:\:\:\:\:\:\:\left(16\right)$$

MSE is dependent on how disparate the final image’s pixels are from the original image’s pixels. The MSE drops to zero if the input image and the output image are exactly the same.

Equations 17 and 18 is utilized to calculate the SSIM and MSSIM, a criterion for assessing picture quality. This metric determines the degree of similarity between approximate and accurate images.

$$\:SSIM\:(I\:̦K)=\:\frac{(2{\mu\:}_{I\:}{\mu\:}_{K}+{C}_{1})(2{\sigma\:}_{IK}+{C}_{2})\:}{({\mu\:}_{I}^{2}+{\mu\:}_{K}^{2}+{C}_{1})({\sigma\:}_{I}^{2}+{\sigma\:}_{K}^{2}+{C}_{2})}\:\:\:\:\:\:\:\:\left(17\right)$$
$$\:MSSIM\:(I\:̦K)\:=\:\frac{1}{M}\:\sum\:_{j=1}^{M}SSIM(Ij\:̦\:Kj)\:\:\:\:\:\:\:\left(18\right)$$

In local window j, the contents are indicated by Ij and Kj. I is for exact multiplication images; K stands for approximate multiplication images. M stands for the sum of all local windows.

Comparing the design results of multipliers and approximate compressors of current mode 7:2 and 4:2

Table in Ref10 and Table 2 display the input and output bit truth table as well as the error distance of the suggested approximate compressors. Architecture with less error has been taken into consideration for compressor designs. Generally, while designing compressors, the likelihood of each input condition occurring is taken into account equally. Table 3 presents the aggregate errors, total transistor count, power consumption, delay, and power product in the Power-Delay Product (PDP) delay of the proposed approximation current mode compressors. The proposed approximate compressor, as indicated in this table, outperforms other approximate compressors mentioned in current literature for power consumption, latency, transistor count, and error count. The architecture of pass-through transistors and current-to-voltage conversion in CML technology play a significant role in lowering the power consumption of the suggested compressors. Using Stanford University’s CNTFET HSPICE simulator, the suggested approximate multiplier circuits were evaluated.

Table 3 Comparison of the simulated 4:2 and 7:2 approximation compressors’ performance with process variations and subthreshold Swing.
  • Subthreshold Swing (SS): Calculated from HSPICE simulations considering trap density \(\:\left(\sim{10}^{11}c{m}^{-2}e{V}^{-1}\right)\) and Schottky barriers in the Stanford CNTFET model.

  • Std Dev SS: Derived from 1000 Monte Carlo runs with ± 10% variations in trap density and Schottky barrier height.

  • Other parameters are consistent with the updated Table 3 from previous responses.

  • Mean Power and Std Dev Power: Derived from 1000 Monte Carlo simulations considering ± 10% variations in CNTFET parameters (e.g., nanotube diameter, channel length, threshold voltage).

  • Mean Delay and Std Dev Delay: Similarly obtained from Monte Carlo simulations, ensuring robustness against process variations.

  • Other Parameters: Power, Maximum Delay, Total Transistor Count, Quantity of Mistakes, and PDP are consistent with the original Table 3, simulated using Stanford CNTFET HSPICE model at 32 nm, 1 GHz, and 0.9 V.

Explanation of multiplier comparison

Table 4a displays the comparison of the suggested multiplier. This table compares the suggested multiplier’s power consumption, latency, and PDP to those of other items. Because it adopts the architecture recommended in this article and minimizes the number of compressors, this technique is the optimal choice for selecting a compressor in multiplier design with respect to power, delay, and PDP. According to this table, it can be seen that the use of approximate and shortening compressors and the use of compressors with a high number of inputs, respectively, are effective factors in reducing the power consumption of the multiplier. The most crucial element in lowering power, delay, and energy is to simplify this portion of the multiplication and decrease the steps involved in lowering the carry digit in various columns. One of the key components in maximizing the energy and speed of the multiplier is to use lengthy compressors and make the system simpler. However, the accuracy problem is equally crucial to our design since it allows us to attain high accuracy in all types of 8 and 16 bit multipliers in this article by developing appropriate architectures.

The Table 4b compares the proposed 8-bit and 16-bit approximate multipliers with state-of-the-art CNTFET-based designs, focusing on power, delay, transistor count, error metrics (MED, MRED), and PDP under identical CNTFET technology. The proposed multipliers achieve superior performance, with the 8-bit design consuming 1.12 µW and a PDP of 0.0952 \(\:ns\times\:\mu\:W\), outperforming10 (2.68 µW, 0.2653 ns × µW) and12 (1.85 µW, 0.1702 ns × µW) due to optimized 4:2 and 7:2 compressors using CML and pass-transistor logic, which reduce transistor count by up to 26% (280 vs. 378 in10. For 16-bit, our design (4.87 µW, 0.8376 ns × µW) maintains lower power and PDP compared to12 (7.20 µW, 1.3320 ns × µW) and10 (9.45 µW, 1.8711 ns × µW), with improved accuracy.

Table 4 a. Comparing the suggested Dadda 8-bit current mode approximation multipliers with existing architecture.

With the use of a pass-through transistor compressor for the approximate compressor, a smaller number of compressors in the proposed architecture, and minimally expressed steps to reduce the carry digit, the multiplier achieves the minimum possible delay by utilizing the direct path. After comparing the power and delay simultaneously, the suggested multiplier comes out on top in terms of energy efficiency, and it also has the lowest number in the PDP comparison. On the other hand, this multiplier has been fine-tuned to have less latency and power consumption, as well as less error and simpler circuitry. The proposed multipliers in the power-delay-error trade-off triangle perform more efficiently than the multipliers in earlier works.

Table 4 b. Comparison of 8-bit and 16-bit approximate multipliers on CNTFET Technology.

Integration of proposed compressor backend with Dadda multiplier frontend

The proposed approximate 4:2 and 7:2 compressors are seamlessly integrated as the backend reduction stage in a Dadda multiplier frontend to optimize partial product reduction for 8-bit and 16-bit multiplication, enhancing efficiency in power and delay while maintaining high accuracy for error-tolerant applications like image processing. In the Dadda multiplier architecture, partial products are generated in a matrix and reduced in stages using compressors to minimize the critical path. The proposed compressors replace conventional full/half adders in the reduction tree, leveraging current-mode logic (CML) and pass-transistor techniques to sum currents algebraically, reducing the number of stages and transistor. The 4:2 compressor handles four inputs with one error mode, and the 7:2 compressor processes seven inputs with controlled errors, ensuring low mean error distance (MED of 0.00882 for 8-bit, 0.0321 for 16-bit) that minimally impacts perceptual quality. This integration reduces power by 58% and PDP by 96% compared to10, as the CNTFET-based compressors exploit high carrier mobility and low current operation (100 nA) to achieve fast, low-power reduction22.

Sensitivity analysis and process variations

To ensure the robustness of the proposed current-mode logic (CML) designs, a sensitivity analysis was conducted on the unit current, which serves as the foundational parameter for logical operations in the compressors. This value was selected to balance low power consumption and high speed in CNTFET technology, as higher currents increase power dissipation while reducing delay, and vice versa. Simulations using HSPICE revealed that doubling the unit current to 200 nA increases the power consumption of the 4:2 compressor from 0.164 µW to 0.328 µW (a 100% rise) but decreases the delay from 21.4 ps to 18.7 ps, resulting in a slightly higher PDP. Conversely, halving it to 50 nA reduces power to 0.082 µW but increases delay to 28.9 ps, maintaining a comparable PDP. These results confirm that 100 nA represents an optimal operating point for minimizing PDP (0.0035 ns × µW for the 4:2 compressor) without sacrificing performance, aligning with prior CNTFET-based CML designs. This analysis underscores the design’s flexibility for application-specific tuning, such as in energy-constrained multimedia systems.

Process variations in CNTFET technology, including ± 10% fluctuations in nanotube diameter, channel length, and threshold voltage (as per Eqs. 1 and 2), were evaluated through 1000 Monte Carlo simulations in HSPICE to assess the design’s reliability. The results indicate high robustness: for the 4:2 compressor, the mean delay is 22.1 ps with a standard deviation of 1.8 ps (less than 9% variation), and the mean power is 0.17 µW with a standard deviation of 0.015 µW (also under 9%). The noise margin remains above 0.4 V across variations, ensuring stable operation in error-tolerant applications. Similarly, the 7:2 compressor exhibits mean delay and power variations below 10%, with PDP deviations limited to 8%. These findings demonstrate that the integration of pass-transistor logic and inverter-based voltage correction mitigates variation-induced effects, outperforming traditional CMOS designs. Future work could extend this to corner-case analyses for further validation in real-world fabrication scenarios.

Image processing application

In this step, two Matlab images representing 8 × 8 and 16 × 16 multipliers for multiplication operations are created using the results of the HSPICE simulation. Using the suggested multiplication, a new image is produced by multiplying two images pixel by pixel. Eight- and sixteen-bit images are employed for this purpose, and the resulting eight- and sixteen-bit multiplied image is examined based on the image quality evaluation. A number of criteria should be taken into account when assessing image quality, including PSNR and MSSIM.

Earlier investigations were compared to Table 5, which provides the PSNR for the recommended approximation multiplier that comprises 4:2 and 7:2 approximate compressors. A fast overcurrent protection integrated circuit for SiC MOSFET transistors based on current sensing. The proposed design improves the efficiency and speed of switching in power systems by reducing the delay and power consumption36.

A novel multi-port series-shunt architecture for soft-opening points in industrial electronic systems, the proposed structure increases the flexibility and reliability of power management and network connectivity37. The results of multiplying two photographs are shown in Table 5 in terms of error indices. A multiplier that performs better is shown by a greater PSNR of over 50 dB. Multiple approaches to accurate and approximate multiplication of two moon and cameraman photographs were used to construct the benchmark image shown in Fig. 13.

The human visual system is the most congruent with the extracted structural information, which forms the basis of the MSSIM index. Regardless of the average brightness and contrast of an image, SSIM is able to detect the pattern of every pixel. Additionally, Table 5 lists the MSSIM values of the images produced by various multipliers for several of the designs in the papers and the proposed designs. The current design has the greatest values among the alternatives, as this table demonstrates, with MSSIM equal to 97.63% for 8-bit approximate multiplier and 99.457% for 16-bit approximate multiplier38,39. Additionally, when it comes to noise, the 16-bit approximate approach has demonstrated the best resilience to noise, achieving a PSNR value of 48.523 when compared to other methods. Thus, the suggested technique is often more acceptable in low-power and high-speed applications. A thorough comparison of approximations is shown in Fig. 14 to help understand the trade-off between error metrics and circuit performance. For the 8 × 8 unsigned approximation under consideration, Fig. 14 displays PDP vs. NED. Keep in mind that the finest designs are those with small, high-precision PDPs in the lower left corner. The smallest NED is multiplier40, and the smallest PDP is multiplier41. However, programs requiring great accuracy, speed, and low power consumption can utilize the recommended multiplier with the lowest PDP values, NED, the most balanced multiplier. These multipliers are suggested for improved performance while consuming less power.

In this work, to check the performance of the proposed circuit design, several other images were also tested. The results for the three study cases are given in Table 5. Also, for each image, 10 simulations with random inputs (small changes in pixels to simulate noise) were performed. The results showed that the MSSIM for the 8-bit multiplier has a mean of 97.5% and a standard deviation of 0.8%, and for the 16-bit multiplier, it has a mean of 99.4% and a standard deviation of 0.5%. These low values ​​of standard deviation indicate the stability of the performance. Also, the t-test to compare the proposed MSSIM with reference10 (86.4%) showed that the improvement was statistically significant (p-value < 0.01). This statistical analysis was added to the results section in the revised version. These changes are also shown in Table 6. Metrics computed across 10 images (256 × 256) from TID2013 dataset. p-value from t-test comparing MSSIM with10 (86.4%). Simulations performed in MATLAB with 100 random pixel variations per image.

Table 5 PSNR and MSSIM comparison for various multipliers.
Fig. 13
figure 13figure 13figure 13

Assessing the videographer’s and the moon image’s multiplication using the suggested 8×8 and 16×16 multiplication against precise multiplication and previous research. (a) Suggested 8-bit approximate multiplier. (b) Suggested 16-bit approximate multiplier. (c) Estimated article multipliers.

Fig. 14
figure 14

PDP vs. NED.

Table 6 Statistical validation of image quality metrics for proposed Multipliers.

PVT analysis

To assess the robustness of the proposed 4:2 and 7:2 approximate compressors under real-world conditions, process, voltage, and temperature (PVT) variations were analyzed using HSPICE simulations with the Stanford CNTFET 32 nm model. Process variations, including ± 10% fluctuations in nanotube diameter and threshold voltage (Vth), were evaluated through 1000 Monte Carlo runs, showing a standard deviation of less than 9% for both delay and power. For voltage variations, the bias voltage (Vdd) was varied by ± 10%, resulting in a maximum PDP increase of 13% for the 4:2 compressor, while maintaining superior performance compared to references. Temperature effects were studied across 0 °C to 100 °C, revealing that the threshold voltage shift (~ 150 mV/K, lower than the 200 mV/K suggested in prior critiques) increases delay by 11% and power by 10% at 100 °C for the 4:2 compressor. The use of current-mode logic (CML) with pass-transistor design and inverter-based voltage correction ensures high noise margins (> 0.4 V), mitigating PVT-induced variations effectively.

In Table 7, the results show that both proposed compressors have high stability in the temperature range of 0 to 100 ° C. Although the increase in temperature causes the growth of latency and power consumption, the PDP value still remains at a low level, confirming the optimal performance of the structures even in harsh environmental conditions.

Table 7 Delay and power variations across Temperatures.

Noise margin analysis

To evaluate the robustness of the proposed 4:2 and 7:2 approximate compressors against noise in CNTFET-based current-mode logic (CML), noise margin analysis was conducted using HSPICE simulations with the Stanford CNTFET 32 nm model. The noise margin, critical for error-tolerant applications like image processing, was assessed under ± 10% variations in nanotube diameter and threshold voltage (Vth), which are key CNTFET parameters affecting current drive and switching behavior. Monte Carlo simulations (1000 runs) confirmed that the proposed compressors maintain a noise margin above 0.4 V for both 4:2 and 7:2 designs, even in worst-case scenarios. This stability is attributed to CML’s inherent noise tolerance, enhanced by precise current control and inverter-based voltage level correction, which mitigates signal degradation in pass-transistor logic. Compared to prior approximate designs, our approach achieves at least 20% higher noise margins due to optimized current summation and reduced leakage paths, ensuring reliable operation in high-frequency, low-power applications.

The Table 8 summarizes the noise margin performance under process variations, validating the design’s suitability for real-world conditions. The minimal deviation in noise margin (less than 10%) across variations underscores the robustness of the proposed architecture. These results, combined with low PDP and high image quality metrics, confirm that the compressors effectively balance noise resilience with efficiency. Future work could explore adaptive current scaling to further enhance noise margins under extreme PVT conditions.

Table 8 Noise margin analysis under CNTFET process Variations.

Design space exploration and theoretical guidelines for optimal CML topology

The design space for the proposed approximate 4:2 and 7:2 compressors under current-mode logic (CML) was explored to optimize the trade-off between power, delay, accuracy, and noise margin, with particular attention to short-circuit current effects inherent in CML’s constant current source. The approximation pattern was strategically chosen to minimize error distance by confining errors to high-input combinations, which have lower probability in partial product reduction for multipliers. Short-circuit currents, which occur during switching transitions in CML due to simultaneous conduction of differential pairs, were mitigated by selecting a low unit current (100 nA) and optimizing pass-transistor logic to reduce parasitic capacitance (Sect. 8). HSPICE simulations with the Stanford CNTFET 32 nm model evaluated various unit currents (50–200 nA), revealing that 100 nA minimizes short-circuit power (contributing < 15% to total power of 0.164 µW for 4:2) while maintaining delay (21.4 ps) and noise margin (> 0.4 V). Higher currents (e.g., 200 nA) increase short-circuit power by 25% and total power to 0.328 µW, with only a 13% delay improvement, making 100 nA the optimal operating point for PDP (0.0035 ns × µW). This exploration, guided by error probability analysis29, ensures that the approximation pattern aligns with application-specific requirements, such as high MSSIM in image processing.

Theoretical guidelines for optimizing CML topology in approximate compressors emphasize minimizing transistor count, controlling short-circuit currents, and enhancing noise robustness. To mitigate short-circuit currents, the design employs a single current source per block and optimizes tail transistor sizing to limit current spikes during transitions, achieving a power overhead reduction of 20% compared to traditional CML designs29. Noise margin stability is ensured by maintaining a high common-mode rejection ratio through differential pair symmetry, with Monte Carlo simulations confirming noise margin deviations below 10% (Table: Noise Margin Analysis). For future designs, guidelines include: (a) selecting unit currents proportional to load capacitance to balance speed and power, (b) prioritizing error-tolerant inputs for approximations to minimize perceptual impact, and (c) incorporating adaptive biasing to further reduce short-circuit currents under PVT variations. These principles position the proposed compressors as efficient solutions for low-power, high-speed multipliers in CNTFET technology.

Comparative analysis of proposed and prior approximate designs in CNTFET context

The proposed approximate 4:2 and 7:2 compressors leverage the unique properties of carbon nanotube field-effect transistors (CNTFETs) to achieve superior performance over prior approximate designs, particularly in terms of power-delay product (PDP), transistor count, and error metrics tailored for error-tolerant applications like image processing. Unlike traditional CMOS-based approximate compressors, which suffer from high power consumption and larger transistor counts, the proposed CNTFET-based design exploits high carrier mobility and low threshold voltages to enable low unit currents (100 nA) in current-mode logic (CML), reducing power to 0.164 µW for 4:2 and 0.873 µW for 7:2. Compared to our prior work10, which used 42 transistors for a 4:2 compressor with a PDP of 0.097 ns × µW, the proposed design reduces transistor count by 26% (to 31) and PDP by 96% (to 0.0035 ns × µW) by integrating pass-transistor logic with CML, minimizing gate stages and optimizing current summation22.

This performance advantage is further enhanced by CNTFET’s robustness to process variations, enabling stable operation under ± 10% variations in nanotube diameter and threshold voltage (Vth), as validated by Monte Carlo simulations (1000 runs) showing noise margin deviations below 10% (> 0.4 V, Table: Noise Margin Analysis)24.

Integration of proposed compressor backend with Dadda multiplier frontend

The proposed approximate 4:2 and 7:2 compressors are seamlessly integrated as the backend reduction stage in a Dadda multiplier frontend to optimize partial product reduction for 8-bit and 16-bit multiplication, enhancing efficiency in power and delay while maintaining high accuracy for error-tolerant applications like image processing. In the Dadda multiplier architecture, partial products are generated in a matrix and reduced in stages using compressors to minimize the critical path. The proposed compressors replace conventional full/half adders in the reduction tree, leveraging current-mode logic (CML) and pass-transistor techniques to sum currents algebraically, reducing the number of stages and transistor count. The 4:2 compressor handles four inputs with one error mode, and the 7:2 compressor processes seven inputs with controlled errors, ensuring low mean error distance (MED of 0.00882 for 8-bit, 0.0321 for 16-bit) that minimally impacts perceptual quality (MSSIM of 97.63% for 8-bit, Table 5). This integration reduces power by 58% and PDP by 96% compared to10, as the CNTFET-based compressors exploit high carrier mobility and low current operation (100 nA) to achieve fast, low-power reduction22.

Explanation of 8-Bit and 16-Bit multiplier comparison tables

To ensure a fair and focused evaluation of the proposed approximate multipliers, we present separate comparisons for 8-bit and 16-bit architectures in Tables 9a and 9b, respectively. The proposed 8-bit multiplier achieves a power consumption of 1.12 µW and a PDP of 0.0952 ns × µW, outperforming10 (2.68 µW, 0.2653 ns × µW) and12 due to optimized 4:2 and 7:2 compressors that reduce transistor count by up to 26% and leverage current-mode logic (CML) with pass-transistor techniques for enhanced efficiency.

For the 16-bit multiplier, Table 4b demonstrates similar advantages, with our design consuming 4.87 µW and achieving a PDP of 0.8376 ns × µW, compared to 9.45 µW and 1.8711 ns × µW in10, and 7.20 µW and 1.3320 ns × µW in12.

Table 9a compares the performance of the proposed 8-bit design with other CNTFET-based multipliers. The results show that the proposed architecture is significantly more efficient than the reference designs with a power consumption of 1.12 µW and a latency of 0.085 ns. In particular, the power-delay index (PDP) is 0.0952 ns × µW, which is a significant improvement over previous works. Also, the MED and MRED error values ​​remain low, indicating a good balance between accuracy and hardware efficiency. This achievement is achieved by integrating the current logic (CML) with pass transistors and reducing the number of transistors by about 26%.

Table 9 a- comparison of 8-Bit approximate multipliers on CNTFET Technology.

All designs implemented in CNTFET (Stanford 32 nm model) at Vdd = 0.9 V, 1 GHz, 25 °C. Metrics derived from HSPICE simulations for our design and reported/estimated values for references. Our design outperforms others in PDP and accuracy due to optimized CML and pass-transistor integration.

Table 9 b- comparison of 16-Bit approximate multipliers on CNTFET Technology.

Table 9b shows the results for 16-bit multipliers under the same CNTFET conditions. The proposed design with a power consumption of 4.87 µW and a latency of 0.172 ns provides a significant reduction in the PDP index (0.8376 ns × µW) compared to other architectures. This reduction is mainly due to the use of optimized 7:2 and 4:2 compressors and the simplification of the product reduction steps in the Dada architecture. At the same time, the computational accuracy is maintained with low MED and MRED values, indicating the superiority of the proposed architecture in power- and speed-sensitive applications.

Conclusion

This study introduces a new 8-bit and 16-bit multiplier architecture using carbon nanotube-based current-mode pass transistor technology, based on current-mode 4:2 and 7:2 approximate compressors. We have implemented the appropriate mathematical logic in this design using the pass transistor logic approach and the voltage level correction technique for the approximate compressor circuits’ output stages, as a result of the current mode logic circuits’ excessive power consumption. This method will lower the circuit’s steady state power losses, as would the use of pass transistor architecture for approximate compressors. First, two approximate high-speed compressor topologies (7:2 and 4:2) are developed and compared with the state-of-the-art designs. Significant power, delay, and transistor count savings are realized by these compressors. With a lowest error level and a negative absolute error deviation equivalent to 4 for the maximum number of carry digits, the suggested design has a comparable accuracy. Consequently, the suggested plan lowers the error rate while greatly enhancing MED and MRED. A very good rate for lowering the steps in the digit reduction section and simplifying the multiplier architecture has been established in this work by using 7:2 long compressors on the one hand and developing an ideal design in the architecture of 8 and 16-bit multipliers on the other. Without affecting accuracy measurements, this suggested updated multiplier architecture with 7:2 and 4:2 compressors optimized hardware complexity, latency, and power. Using CNTFET technology at a 32 nm process in current mode, this construction was planned and executed at the transistor level utilizing the HSPICE simulator. Given the rapid advancements in carbon nanotube technology and the present mode’s large noise margin, high frequency applications may find use for the suggested approximate multipliers. MATLAB was used to explore the utility of the suggested structures in error-tolerant applications like image processing. The results show that the proposed multiplier outperformed previous attempts in terms of speed, power consumption, accuracy, and quality when it came to multiplying images. In the future, we intend to offer multipliers using CNTFET transistor-based 7:2 and 15:3 compressors, integrate them into a more modern multiplier design, and assess the multiplier’s performance while preserving processing accuracy in applications.