Abstract
Piezoelectric MEMS speakers, an emerging technology with great promise, face significant challenges in performance evaluation and rational design. Their broadband nature means that responses at every frequency point across the whole operating bandwidth contribute to performance, yet there is no widely recognized weighting approach for fair evaluation. This absence of quantitative criteria makes objective comparisons of different designs difficult, slowing the adoption of new design concepts; and it leads to ambiguous design goals without response balance across frequency bands. Additionally, the current design methods rely on labor-intensive simulations, further prolonging the development process. To address these challenges, two figures of merit (FOMs) obtained via theoretical deduction are proposed in this study. These FOMs facilitate the evaluation of key metrics, such as sound pressure level and energy efficiency over a wide frequency range, enabling quantitative comparisons among various speaker designs. On the basis of FOMs, the design process can be simplified into a single-objective optimization problem, significantly streamlining the speaker design. Using this method, piezoelectric MEMS speakers with ultra-high FOMs and superior performance are demonstrated. The normalized SPLs at 1 and 10 kHz reach an impressive 76.6 and 86.6 dB/mm²/Vrms, respectively, with normalized sensitivities of 91.2 and 91.5 dB/mm2/mW. This achievement validates our FOM theory, representing a notable advancement in the field.

Similar content being viewed by others
Introduction
In recent years, piezoelectric MEMS speakers have garnered significant interest as audio solutions across the consumer, medical, and industrial fields. They offer unique advantages over conventional micro speakers, such as compact size, excellent high-frequency response, low power consumption, robustness against dust and water, ease of mass production, and integration compatibility with electronic components1,2,3,4. Despite these benefits, MEMS speakers have yet to challenge the market dominance of traditional electrodynamic and balanced armature speakers. This is primarily due to their limitations, such as low output sound pressure level (SPL), high driving voltage, and narrow bandwidth1,5,6.
In response to these challenges, researchers have conducted many valuable explorations. In 2018, Stoppel et al. proposed a diaphragm structure composed of four triangular cantilever beams7,8, which significantly increased deflection and provided a basis for mechanical decoupled diaphragms as the mainstream in piezoelectric MEMS speaker design. They subsequently modified their design from a triangular cantilever beam to a rectangular shape to increase the effective radiating area, achieving a 10 dB increase in the normalized SPL at 30 Hz9. Cheng et al. proposed a series of speakers that use spring actuators to drive a central diaphragm, thereby enhancing the low-frequency response10,11,12. Many subsequent works13,14,15 focused on developments of this design philosophy, such as improving the size and shape of the actuators and central diaphragm and enhancing connection methods. However, despite these improvements in the quasi-static response, these new designs also cause a forward shift in the resonant frequency, which subsequently reduces the high-frequency response. This is because the increase in quasi-static sound pressure is largely due to reduced structural stiffness, which in turn lowers the resonant frequency. Additionally, there is another type of speaker that does not use a piezoelectric actuator as the acoustic surface but rather as a driving source for a passive diaphragm16,17. The displacement is transferred through a heavy coupling block. Despite this maximization of the equivalent radiating area, this approach introduces an additional mass into the system, which also causes a decrease in the high-frequency response.
The aforementioned works reflect two major issues in MEMS speaker unit research. First, the structure design is tricky, as it requires balancing various metrics. Enhancing the SPL in specific frequency bands often involves other trade-offs, such as a reduced response in other frequency ranges. Achieving high performance requires a careful balance. However, the current design of MEMS speakers relies heavily on time-consuming FEM simulations, making it difficult to achieve this balance. Second, it is challenging to evaluate and compare different designs due to the wideband nature of speakers and the unclear effects of performance variations. Notably, an improvement in performance within one frequency band is often accompanied by changes in many other bands, making it difficult to determine whether improvements arise from a truly better design or are simply the result of compensation. Additionally, as speakers are wideband devices, each frequency point within the operating frequency range contributes to the overall evaluation, further complicating the fair comparison of different designs. These two issues make it difficult to integrate advanced design concepts, leading to a time-consuming and labor-intensive design process, which ultimately hampers fast development in the field.
Given the aforementioned issues, the development of speaker units has reached a stalemate. Recent research has focused predominantly on topics other than speaker units, such as bandwidth enhancement based on multi-resonant monolithic integration15,18,19,20 and polymer coatings21,22,23,24. However, this does not imply that the optimization of a single unit is unimportant and that considerable research has not been focused on this issue; rather, research has reached a stage where further progress cannot be made without the aid of theoretical guidance. When a system has several competing metrics, introducing a figure of merit (FOM) to facilitate fair comparison is common. For MEMS speakers, establishing an appropriate FOM could help address the major challenges in design evaluation. A theoretical approach capable of quantitatively revealing the intrinsic correlations among performance parameters, on the basis of which an FOM can be derived, is crucial. Only through such an approach can the full potential of MEMS speakers be realized, enabling balanced audio designs. While there have been several theoretical studies of piezoelectric MEMS speakers, most have concentrated on building equivalent circuit models to enable rapid performance predictions25,26,27 rather than investigating the relationships among design variables. These models are complex, making it difficult to derive generalizable and abstract conclusions. Additionally, other studies have been limited by their focus on a single performance metric28, restricting their broad applicability. To date, no integrated metric such as an FOM has been proposed for evaluating MEMS speakers, nor has a method been introduced to shift design approaches away from repetitive trial-and-error simulations.
In this paper, two FOMs are proposed for evaluating the output frequency response and energy efficiency of MEMS speakers: broadband normalized voltage sensitivity (FOMvol) and broadband normalized power sensitivity (FOMsen). Additionally, precise expressions for these two FOMs are derived for cantilever-type diaphragms of arbitrary shape. Through analysis, the independence of these FOMs from planar dimensions is confirmed, and this conclusion extends to many other structures in addition to cantilevers. The rationality of using these two FOMs as broadband performance indicators for MEMS speakers is demonstrated. By optimizing the FOM as part of the design philosophy, we demonstrate a piezoelectric MEMS speaker that exhibits better normalized output SPL and energy efficiency than any current MEMS speakers, thus validating the effectiveness of the FOMs and the related theory.
Results and discussion
FOM theory
The cantilever piezoelectric speaker is one of the most representative types of piezoelectric MEMS speakers, and many other structures can be seen as variants of it. Therefore, the theoretical analysis in this paper is based on a cantilever beam piezoelectric MEMS speaker. The theoretical analysis presented in this work is based on several assumptions:
-
1.
A multilayered speaker integrates layers made of homogeneous and isotropic materials;
-
2.
The nonlinearity and viscosity of the layers can be ignored, and the speaker works in a given linear region;
-
3.
The residual stress in the speaker is neglected;
-
4.
The maximum section width is not greater than five times the root width, so the displacement at each point in the same section can be considered the same.
The primary application scenario for MEMS speakers is in-ear applications, where the speaker’s sound pressure level (SPL) frequency response often resembles that shown in Fig. 1a. The sound pressure \(\Delta p\) can be expressed as follows29,30:
where V0 is the initial volume of the ear cavity, p0 is the ambient pressure, Gain is the gain of the human ear for sounds, which is a function of frequency, and ΔV is the air volume change associated with the vibration of the speaker, which can be represented as10:
where S represents the active area of the speaker, U is the driving voltage, \({\bar{\delta }}_{{st}}\) and \({\bar{\delta }}_{{dy}}\) are the average displacement of the diaphragm per unit voltage for static and dynamic conditions, respectively, ξ is the damping ratio, fin is the driving frequency, and fr is the first-order resonant frequency of the beam. By substituting (2) and (3) into (1) and excluding constants unrelated to device characteristics, the determinants of \({\overline{{\Delta p}_{k}}}\) and \({\overline{{\Delta p}_{m}}}\), which represent the output sound pressure per unit area per unit voltage for the stiffness control region (\({f}_{{in}}\ll {f}_{r}\)) and the mass control region (\({f}_{{in}}\gg {f}_{r}\)), respectively, are as follows:
a A typical SPL frequency response curve of a piezoelectric MEMS speaker. b Three types of optimization for the frequency response in a pure cavity. i: costly quasi-static enhancements; ii: costly high-frequency enhancements; iii: perfect optimization. c Structural schematic of a MEMS speaker with a random planar shape
From (4) and (5), \({\bar{\delta }}_{{st}}\) is the only determinant of the quasi-static response, whereas this is not the case for the response in the mass control region, where fr must also be considered. This can be interpreted as follows: the speaker undergoes an amplitude decay with a slope of 12 dB/octave after the first-order resonance (ignoring the higher-order modes). For a given \({f}_{{in}}\gg {f}_{r}\), the larger fr is, the lower the amplitude decay of the output.
Therefore, both \({\bar{\delta }}_{{st}}\) and fr must be considered. As shown in Fig. 1b, optimizing only \({\bar{\delta }}_{{st}}\) may result in (i), and optimizing only fr may result in (ii). It is not conclusive that (i) or (ii) are better than the result indicated by the original curve, as they sacrifice high- and low-frequency performance, respectively. What is needed is (iii), which corresponds to larger \({\bar{\delta }}_{{st}}\) and fr. However, the two are, to some extent, mutually constrained: increasing the displacement usually reduces the resonant frequency. Therefore, the geometric mean of the determinants of \({\overline{{\Delta p}_{k}}}\) and \({\overline{{\Delta p}_{m}}}\) is used to determine a weighting scheme for \({\bar{\delta }}_{{st}}\) and fr in the optimization process, which we define as the speaker FOMvol. fin is not reflected in FOMvol because of its irrelevance given the device characteristics.
As shown in (7), FOMvol can also be regarded as the product of the quasi-static response and the resonant frequency, through which the entire frequency response curve can essentially be reconstructed. Therefore, FOMvol can represent the ability of a speaker to output sound pressure over its entire operating bandwidth per unit area and per unit voltage.
In addition, the energy efficiency ratio is another important metric that can be characterized in terms of normalized power sensitivity. A piezoelectric MEMS speaker can be regarded electrically as a capacitor, and its power consumption per unit area can be expressed as28:
where γ is the ratio of the area covered by the piezoelectric material to the total diaphragm area. The normalized sensitivity, which represents the sound pressure level produced per milliwatt in a unit area, can be expressed as:
Eliminating the irrelevant terms in (9) and considering (6) together, we define a parameter FOMsen that can directly represent the energy efficiency ratio of a MEMS speaker:
The derivation of both FOMs as functions of design parameters requires first determining the expressions of \({\bar{\delta }}_{{st}}\) and fr.
Mean displacement
For a cantilever beam piezoelectric MEMS speaker, the beam radius of curvature R (Fig. 1c) can be obtained by applying the force boundary condition (11) to solve the piezoelectric Eq. (12).
where F and M are the resultant force and moment of the cantilever beam actuator, respectively; y is the distance to the neutral-plane; zi and ti are the bottom position coordinate and thickness of each layer, respectively; Ti is the mechanical stress in the length direction of each layer; Yi is the Young’s modulus of each layer; and w(x) is the section width at location x. Since the curvature can be regarded as the second-order derivative of the deflection curve, the deflection curve function \(\delta (x)\) can be obtained by integrating the curvature28.
For the bimorph:
For the unimorph:
where A = Ym/Yp, Ae = Ye/Yp, B=tm/tp, Be = te/tp and tp, tm and te denote the thicknesses of the piezoelectric layer, the support layer and the electrode layer, respectively. Notably, to obtain a concise representation, the thicknesses of all electrode layers are assumed to be the same. The static air volume change ΔVst caused by the vibration of the diaphragm can be obtained via integration. Normalizing this factor based on the area and voltage yields \({\bar{\delta }}_{{st}}\):
where N is the number of array elements and f(x) is the ratio of the cross-sectional width at position x to the anchor width.
Resonant frequency
For out-of-plane bending of beams with width-line density m(x) and modal function g(x), the maximum potential energy Um and kinetic energy Km are determined as follows31:
By setting Km = Um, fr can be expressed as:
The width-line density m(x) and the equivalent moment of inertia for the piezoelectric layer I(x) are given by:
where yc is the neutral axis position and \({\phi }_{2}\) and \({\phi }_{3}\) are dimensionless functions:
For the bimorph:
For the unimorph:
g(x) is the ratio of the displacement at each point relative to that at the tip of the cantilever beam. The static deflection mode function is not g(x), as there are differences between the resonant and quasi-static modes. However, the exact analytical expression of g(x) for an arbitrarily planar beam is difficult to derive and complex in form. Therefore, the Ritz method was used to obtain an approximate solution of g(x) with a simple form. Notably, the accuracy of this approach is sufficient for engineering applications. In this method, arbitrary second-order derivable functions \({\varphi }_{i}\)(x) need to be chosen, and they need to satisfy only the displacement boundary conditions. Then, the stiffness matrix K and the mass matrix M can be derived32:
The eigenvalues \({\bar{\omega }}_{i}^{2}\) and their corresponding eigenvectors αi can be obtained by solving the following matrix eigenvalue problem:
The number of eigenvalues agrees with the number of \({\varphi }_{i}\)(x). By substituting the eigenvector \({\alpha }_{1}\) corresponding to the smallest eigenvalue \({\bar{\omega }}_{1}^{2}\) into (33), the Ritz approximate solution of g(x) for the first-order modes can be obtained:
Notably, the above method can be used to find the resonant frequency and mode at higher orders up to order n.
For subsequent derivations, g(x) and f(x) are normalized with respect to the beam length L to obtain \(\hat{g}\)(t) and \(\hat{f}\)(t), which are independent of any factor except shape. Below, t is equal to x/L.
Analysis of the FOMs
Substituting (35) and (18) into (7) and (10) yields the following expressions for FOMvol and FOMsen:
where Ω is a formula related only to the planar shape and not to size.
The calculated Ω values for seven different shapes are given in Fig. 2a, and the corresponding \(\hat{g}(t)\) and \(\hat{f}(t)\) values for these shapes are given in Table 1. The mean value of Ω is 1.1812 with a standard deviation of 0.0242, indicating that the variation in Ω is negligible. Therefore, Ω can be treated as a constant in subsequent analyses. Additionally, it is clear that L and w do not appear in (36) and (37). This suggests that planar factors have a negligible effect on the FOMs.
a Calculated Ω values for seven different planar structures. b The irrelevance of the FOMs to beam length
The value of the dimensionless function \(\sqrt{\phi_3/\phi_1\phi_2}\) is determined by the laminated structure, which involves the selection of materials for electrodes and structural layers given the ratio between the thickness of each of these layers and the piezoelectric layer. Unlike Ω, this term cannot be treated as a constant. Thus, the FOMs are clearly influenced mainly by the layer thickness and material selected and are almost fully independent of the planar shape and size. An essential and interesting conclusion is that the correspondence between the FOMs and the planar structure is one-to-many. To illustrate this point, Fig. 2b provides a concrete example with the beam length L as the variable. When the design in the thickness direction is fixed, altering L merely involves a trade-off between the quasi-static output and the resonant frequency, with their product remaining unchanged. Similarly, replacing L in Fig. 2b with w or the planar shape \(\hat{f}\)(t) does not significantly influence the FOMs.
Although the derivation above was specifically focused on the cantilever beam, it is likely that similar principles apply to other structures. However, conducting a rigorous theoretical derivation for each structure is challenging. Therefore, three typical designs were selected to extend the theory through FEM simulations. They share the same thickness configurations: a unimorph structure comprising a 5 μm silicon support layer, 0.1 μm molybdenum electrodes, and a 0.5 μm AlN piezoelectric layer. Detailed planar geometries for these designs can be found in the literature4,10,33.
Herein, FOMvol was calculated by first obtaining simulated resonant frequencies fr and mean displacement \({\bar{\delta }}_{{st}}\), then incorporating them into (7). Figure 3 illustrates the trend of the simulated FOMvol as a function of the planar dimensions for these structures. Figure 3a shows the trend for a cantilever-type structure. As depicted in Fig. 3b–d, similar to the situation with cantilever beams, the FOM is almost unaffected by the planar dimensions. This indicates that the independence of the FOM from planar dimensions may be a general characteristic applicable to a wide range of structures. In contrast, the independence of the FOM from planar shape does not hold as a universal rule. This is evident from the differing FOM values observed in Fig. 3 for the four structures, despite their identical thickness structure. Based on current evidence, only the cantilever beam type has demonstrated planar shape independence of the FOMs.
The independence of FOMs from planar dimensions enables them to capture the intrinsic characteristics of a structure, remaining unaffected by variations in size. This means that the broadband characteristics of a design can be easily derived from a single example via the relevant FOMs. In other words, once the frequency response of one speaker is established, the responses of other speakers with the same design but varying sizes can be predicted. Moreover, since FOMs are the product of the quasi-static response and natural frequency, a structure with high FOMs will exhibit excellent performance at a given natural frequency. This suggests that when comparing two speakers with different resonant frequencies, adjusting their planar dimensions to match the resonant frequencies for normalized output comparison is unnecessary, and simply comparing their FOMs will suffice. The independence of FOMs from planar dimensions thus makes them convenient and practical criteria for evaluating the wideband performance of speakers.
In addition to serving as practical evaluation criteria, FOMs can aid in simplifying the design process. Design variables can be categorized into two types on the basis of their impact on the FOM(s): relevant variables and irrelevant variables. During the design process, these two types of variables can be determined independently without mutual influence. The determination of the relevant variables can be prioritized, and this process can be transformed into a single-objective, multivariable optimization problem. The corresponding mathematical model is presented as follows:
where x represents the vector of design variables and ub and lb represent the vectors of the upper and lower bounds, which are related to the fabrication process. After the relevant variables are identified, the irrelevant variables can be determined either through solving the associated equations or via simulation on the basis of the desired resonant frequency. The effective diaphragm area of the device can then be adjusted by varying the number of elements N. These two steps can be carried out without mutual interference.
FOM-based design
To demonstrate the proposed design method and validate its effectiveness, we designed, fabricated and characterized a set of MEMS speakers. The objective was set as developing a compact, low-voltage, low-power MEMS tweeter. To achieve this, AlN was selected as the piezoelectric material. Notably, AlN has a well-established manufacturing process and a low dielectric constant. Additionally, the resonant frequency was targeted at approximately 5 kHz to ensure a balanced response across the target frequency range of 2–16 kHz. The first step was to determine the relevant variables. The corresponding mathematical model is shown in Eq. (39). Given the relatively low dimensionality of the vector x in this problem, the optimal solution for the FOM was determined by analyzing the graphs of the FOMs concerning the relevant variables. As shown in Fig. 4, for the bimorph structure, the FOMvol increases monotonically as both tp and te decrease, with the steepest gradient of increase at small thicknesses. FOMsen exhibits a similar trend; however, the key difference is that for a fixed te, there is an optimal point for tp, which decreases as te decreases. For the unimorph structure, minimizing tp within the process constraints yields better results. Within the same fabrication boundaries, the bimorph structure has a higher upper limit for both efficiency and output capability than the unimorph structure does. On the basis of practical considerations, the minimum thicknesses for Mo, AlN, and Si are set to 50 nm, 100 nm, and 1 µm, respectively. Within the fabrication boundaries, the maximum value of FOMvol corresponds to the combination of a bimorph structure and the minimum permissible thickness. In this configuration, FOMsen also reaches a highly desirable level. Therefore, the designed thicknesses of the piezoelectric layers and the electrodes are set to 100 nm and 50 nm, respectively.
FOM trends with respect to relevant variables
Since there are no theoretical FOM differences in planar shapes for cantilever speakers, we selected a triangular design scheme for the planar design. The dimensional parameters were specifically designed to ensure that the resonant frequency falls within the 4–6 kHz range, thereby achieving a balanced response in the mid-to-high frequency spectrum. The lengths L of the three different variants were set to 425 µm, 400 µm, and 375 µm, respectively. The details are illustrated in Fig. 5a. To evaluate the acoustic performance of the speakers, a simulation model was established using COMSOL Multiphysics 6.1, as illustrated in Fig. 5b. The speaker was connected on one side to the IEC 60318-4 ear simulator from the COMSOL library, while the opposite side was placed in a free sound field environment bounded by perfectly matched layers to absorb outgoing sound pressure13. Notably, the characteristic mesh element size throughout the model was constrained to be smaller than 1/7 of the wavelength of 20 kHz (maximum frequency) to ensure numerical accuracy. The FEM simulation results for the acoustic response, presented in Fig. 5c, reveal that all design variants exhibit a consistent peak above 10 kHz. This artifact originates from the half-wavelength standing wave resonance inherent to the IEC 60318-4 ear simulator2,13.
a Schematic diagram of the proposed design. b FEM modeling and setup. c Simulated SPL frequency response with a driving voltage of 2.5 Vrms
Performance results and comparison
Figure 6a, b display a photograph of the fabricated device and its surface profile obtained by the confocal microscope, respectively. Figure 6c shows the cross-sectional SEM image. To evaluate the crystalline quality of the 100-nm-thick AlN film, the X-ray rocking curve of the as-deposited AlN piezoelectric layer is shown in Fig. 6d. The fabricated piezoelectric MEMS speaker was acoustically characterized by using an IEC 60318-4 ear simulator. Under a 2.5 Vrms excitation, the SPL at 1 kHz for the three variants reached 95.4, 94.4, and 92.6 dB, respectively (Fig. 7a). Among these, the variant with a length L of 425 µm maintained an SPL above 100 dB across the core high-frequency range of 2–13 kHz, with a maximum SPL of 115 dB. The resonant frequencies were 4.5, 5, and 5.8 kHz, respectively. A noticeable low-frequency degradation was observed in the frequency band below 400 Hz due to cantilever warping induced by residual stress, which resulted in an increased slit width. Wider slits lead to more severe acoustic short-circuiting, a phenomenon where out-of-phase acoustic waves from opposite sides of the diaphragm cancel each other out. It becomes particularly pronounced in the low-frequency range1,10,21,23,24. The warping-induced slit width enlargement was not accounted for in the numerical model, resulting in discrepancies between simulated and experimentally measured responses within this frequency band. Overall, the results align well with those of the finite element simulation presented in Fig. 5c. Discrepancies may be attributed to factors not accounted for in the simulation, such as residual stresses, manufacturing tolerances in the back cavity boundaries, and viscous effects in air domains outside the slit region.
a Photograph of the device fabricated on a coin. b Surface profile (L = 375 μm) obtained by the confocal microscope. c Cross-sectional SEM image. d X-ray rocking curve of the as-deposited AlN piezoelectric layer
Figure 7b, c show the normalized SPL and FOMvol. Figure 7d, e show the normalized sensitivity and FOMsen. As predicted, the FOMs for the three variants are arguably the same, demonstrating good agreement with theoretical values; designs with longer beams exhibit a superior frequency band response before the resonant frequency is reached, whereas designs with shorter beams display a better response above the resonant frequency. The variant with a beam length of 400 µm is used as an example; its normalized SPL at 1 and 10 kHz reaches an impressive 76.6 and 86.6 dB/mm²/Vrms, respectively, with normalized sensitivities of 91.2 and 91.5 dB/mm2/mW. The normalized performance and FOMs of four state-of-the-art MEMS speakers are also shown in Fig. 7b–e9,14,28,34. 1 kHz and 10 kHz were selected for performance comparisons, as they serve as common metrics for assessing mid- and high-frequency response characteristics. The proposed bimorph piezoelectric speaker yields the highest FOMs, normalized voltage sensitivity and normalized power sensitivity to date, indicating that the speaker is well suited for low-voltage drive conditions. The two FOMs are relatively well balanced in the proposed speaker design, with both metrics reaching high levels. This balance is attributed to the precise theoretical expressions that aid in finding optimal parameter combinations.
Discussion
Validation of FOM theory
Despite the use of AlN, a piezoelectric material not typically known for its actuation ability, the proposed speaker with optimized FOMs demonstrates remarkable normalized output performance, exceeding the performance of a wide range of PZT-driven speakers. This finding indicates that regardless of the materials or structures used, designs with high FOMs consistently exhibit superior performance. It is hypothesized that the use of materials with a high d31, such as PZT, through the optimization of FOMs could further enhance speaker performance. Furthermore, this result strongly supports the effectiveness of the design method, in which the FOMs are optimized considering the potential of the speaker driving materials, potentially providing a basis for speaker optimization.
On the other hand, the effective range of FOM theory has been validated. The measured performance of the speaker and the theoretical predictions align well. When the diaphragm is sufficiently thin, various issues can arise, such as residual stress and deterioration of crystal quality, which make simplified theoretical analysis challenging. Therefore, experimental validation is necessary. The test results for the proposed structure indicate that the FOM theory remains valuable for reference at a total diaphragm thickness as thin as 0.4 μm, fully covering the thickness range of all existing MEMS speaker designs.
Significance and limits of FOM theory
In this work, FOMs represent the output capability across a wide frequency range and are easy to calculate, making them suitable for comparing the performance of different designs. Furthermore, FOMs can be used to guide design, as their independence with respect to planar dimensions allows the speaker design to be simplified from a multi-objective, multivariable optimization problem into a systematic problem of single-objective, multivariable optimization combined with solving the relevant equations. This strategy transforms the design of MEMS speakers from a simulation-based trial-and-error approach into a more efficient mathematical approach.
Theoretical and experimental results show that the influence of planar dimensions on the FOMs is negligible, especially for cantilever-type speakers, for which even the shape has a negligible effect on FOM values. However, this does not imply that planar shapes are not important. In fact, the FOMs do not reflect all aspects of speaker performance and only incorporate basic parameters such as output sensitivity and efficiency, which is the greatest limitation of the proposed approach. Planar factors significantly influence other aspects of performance, such as total harmonic distortion (THD), high-order modes, and the fill factor. For example, shapes with a large equivalent radiating area require less displacement to achieve the same SPL. Reduced displacement leads to weaker geometric nonlinearity, which generally results in lower THD13,14. Additionally, the planar design can be adjusted to change the positions of high-order modes, even when the first-order resonant frequency is fixed, such as to fulfill the requirements of ultrasonic applications35,36. In other words, FOM theory provides a simple way to control fundamental performance parameters such as SPL and efficiency, allowing designers to focus on other capabilities of MEMS speakers.
Although theoretical calculations were validated through FEM simulations and experiments, the designed resonant frequency tends to be slightly lower than that predicted. This is attributed to the fact that the radiation impedance, also referred to as the “co-vibrating air mass” in the audio field, is not considered in the model. This issue becomes increasingly pronounced as the diaphragm thickness decreases. In future work, we will incorporate this factor into the model and further optimize the corresponding method.
Conclusion
In this work, we derive expressions for the average displacement, resonant frequency and sensitivity of cantilever beams of arbitrary shapes. We define the products of the quasi-static normalized voltage sensitivity and power sensitivity and the natural frequency as FOMvol and FOMsen, respectively. The analysis reveals that planar parameters are independent of the FOMs and that FOMs can serve as a reliable metric for evaluating speaker designs for certain structural classes. In addition, FOMs can be used to enhance speaker design. Under the guidance of FOM theory, the developed MEMS speaker demonstrates outstanding performance, with normalized sound pressure and sensitivity values ranking higher than those of other speakers. The success of this speaker confirms the reliability of FOM theory, even at the microscale, with a total speaker thickness of only 0.4 μm.
FOMs offer a comprehensive basis for evaluating speaker performance across a wide bandwidth and provide a systematic and quantifiable framework for assessing MEMS speakers. The introduction of FOMs transforms the design process, shifting it from time-consuming trial-and-error methods based on FEM or LEM simulations to a streamlined, single-objective, multivariable problem-solving approach, significantly improving both efficiency and focus.
References
Wang, H. et al. Review of recent development of MEMS speakers. Micromachines 12, https://doi.org/10.3390/mi12101257 (2021).
Ma, Y. et al. A PZT MEMS loudspeaker with a quasi-closed diaphragm. Sens. Actuators A Phys. 358, https://doi.org/10.1016/j.sna.2023.114454 (2023).
Garud, M. & Pratap, R. MEMS audio speakers. J. Micromech. Microeng. 34, https://doi.org/10.1088/1361-6439/acfe86 (2023).
Wang, H., Chen, Z. & Xie, H. A high-SPL piezoelectric MEMS loud speaker based on thin ceramic PZT. Sens. Actuators A Phys. 309, https://doi.org/10.1016/j.sna.2020.112018 (2020).
Wang, H., Feng, P. X. L. & Xie, H. A Dual-electrode MEMS speaker based on ceramic PZT with improved sound pressure level by phase tuning. In: Proc. IEEE 34th International Conference on Micro Electro Mechanical Systems (MEMS) 701–704 (IEEE, 2021).
Fei, Y., Zhang, H., Liu, Z., Zhang, N. & Zhou, X. Performance optimization of piezoelectric MEMS speaker with cantilever diaphragm array. In: Proc. IEEE 19th International Conference on Nano/Micro Engineered and Molecular Systems (NEMS) 1–4 (IEEE, 2024).
Stoppel, F. et al. New integrated full-range MEMS speaker for in-ear applications. In: Proc. 31st IEEE International Conference on Micro Electro Mechanical Systems (MEMS). 1068–1071 (IEEE, 2018).
Stoppel, F. et al. Novel membrane-less two-way mems loudspeaker based on piezoelectric dual-concentric actuators. In: Proc. 19th International Conference on Solid-State Sensors, Actuators and Microsystems (Transducers) 2047–2050 (IEEE, 2017).
Stoppel, F. et al. Highly miniaturized in-ear MEMS loudspeaker featuring high SPL. In: Proc. 22nd International Conference on Solid-State Sensors, Actuators and Microsystems (Transducers) 865–868 (IEEE, 2023).
Cheng, H.-H. et al. On the design of piezoelectric MEMS microspeaker for the sound pressure level enhancement. Sens. Actuators A Phys. 306, https://doi.org/10.1016/j.sna.2020.111960 (2020).
Cheng, H.-H. et al. Piezoelectric microspeaker using novel driving approach and electrode design for frequency range improvement. In: Proc. 33rd IEEE International Conference on Micro Electro Mechanical Systems (MEMS) 513–516 (2020).
Cheng, H. H. et al. Piezoelectric mems microspeaker with suspension springs and dual electrode to enhance sound pressure level. In: Proc. 32nd IEEE International Conference on Micro Electro Mechanical Systems (IEEE MEMS) 767–770 (2019).
Gazzola, C., Zega, V., Cerini, F., Adorno, S. & Corigliano, A. On the design and modeling of a full-range piezoelectric MEMS loudspeaker for in-ear applications. J. Microelectromech. Syst. 32, 626–637, https://doi.org/10.1109/jmems.2023.3312254 (2023).
Wei, T.-C., Hu, Z.-S., Chang, S.-W. & Fang, W. On the design of piezoelectric MEMS microspeaker with high fidelity and wide bandwidth. In: Proc. IEEE 36th International Conference on Micro Electro Mechanical Systems (MEMS) 127–130 (IEEE, 2023).
Lin, C.-H. et al. Bandwidth enhancement of piezoelectric MEMS microspeaker via central diaphragm actuation and filter integration. In: Proc. IEEE 37th International Conference on Micro Electro Mechanical Systems (MEMS) 713–716 (IEEE, 2024).
Liechti, R. et al. A piezoelectric MEMS loudspeaker lumped and FEM models. In: Proc. 22nd International Conference on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems (EuroSimE) 1–8 (IEEE, 2021).
Hirano, Y. et al. PZT MEMS speaker integrated with silicon-parylene composite corrugated diaphragm. In: Proc. 35th IEEE International Conference on Micro Electro Mechanical Systems Conference (IEEE MEMS) 255–258 (IEEE, 2022).
Chen, Y.-C. et al. On the design of a two-way piezoelectric MEMS microspeaker based on a multi-shape cantilever array for high-frequency applications. J. Micromechan. Microeng. 33, https://doi.org/10.1088/1361-6439/acceb1 (2023).
Cheng, H.-H. & Fang, W. THD improvement of piezoelectric MEMS speakers by dual cantilever units with well-designed resonant frequencies. Sens. Actuators A Phys. 377, https://doi.org/10.1016/j.sna.2024.115717 (2024).
Sun, M., Zhang, M., Liu, C. & Pang, W. Broadband MEMS speaker by single-way multi-resonance array with acoustic damping tuning: a proof of concept. In: Proc. IEEE 36th International Conference on Micro Electro Mechanical Systems (MEMS) 677–680 (IEEE, 2023).
Wang, Y., Lv, T., Zhang, J. & Yu, H. Capillary effect-based selective sealing strategy for increasing piezoelectric MEMS speaker performance. Microsyst. Nanoeng. 10, https://doi.org/10.1038/s41378-024-00753-x (2024).
Wang, Q., Hu, Z., Ruan, T. & Zhao, J. Unsealed piezoelectric MEMS speaker with rigid-flexible composite membrane. J. Micromech. Microeng. 34, https://doi.org/10.1088/1361-6439/ad42a6 (2024).
Wang, Q. et al. Obtaining high SPL piezoelectric MEMS speaker via a rigid-flexible vibration coupling mechanism. J. Microelectromech. Syst. 30, 725–732, https://doi.org/10.1109/jmems.2021.3087718 (2021).
Xu, L. et al. A piezoelectric MEMS speaker with stretchable film sealing. In: Proc. IEEE 36th International Conference on Micro Electro Mechanical Systems (MEMS). 673–676 (IEEE, 2023).
Liu, W., Huang, J., Shen, Y. & Cheng, J. Theoretical modeling of piezoelectric cantilever MEMS loudspeakers. Appl. Sci. 11, https://doi.org/10.3390/app11146323 (2021).
Liechti, R. et al. Total harmonic distortion of a piezoelectric MEMS loudspeaker in an IEC 60318-4 coupler estimation using static measurements and a nonlinear state space model. Micromachines 12, https://doi.org/10.3390/mi12121437 (2021).
Gazzola, C., Zega, V., Corigliano, A., Lotton, P. & Melon, M. A reduced-order-model-based equivalent circuit for piezoelectric micro-electro-mechanical-system loudspeakers modeling. J. Acoust. Soc. Am. 155, 1503–1514, https://doi.org/10.1121/10.0024939 (2024).
Liu, C. et al. Ultrahigh-sensitivity piezoelectric AlN MEMS speakers enabled by analytical expressions. J. Microelectromech. Syst. 31, 664–672, https://doi.org/10.1109/jmems.2022.3181666 (2022).
Kaiser, B. et al. Concept and proof for an all-silicon MEMS micro speaker utilizing air chambers. Microsyst. Nanoeng. 5, 43, https://doi.org/10.1038/s41378-019-0095-9 (2019).
IEC. Electroacoustics - Simulators of Human Head and Ear—Part 4: Occluded-Ear Simulator for the Measurement of Earphones Coupled to the Ear by means of Ear Inserts (IEC, 2010).
Herrera-May, A. L., Aguilera-Cortés, L. A., Plascencia-Mora, H., Rodríguez-Morales, Á. L. & Lu, J. Analytical modeling for the bending resonant frequency of multilayered microresonators with variable cross-section. Sensors 11, 8203–8226, https://doi.org/10.3390/s110908203 (2011).
Kinsler, L. E., Frey, A. R., Coppens, A. B. & Sanders, J. V. Fundamentals of Acoustics (John Wiley & Sons, 2000).
Chiung CL et al. Sound producing device. US 2021/0204067 A1 (2021).
USOUND. Conamara-UA-C0601-2T engineering samples datasheet. <https://usound.com/product/conamara-series-mems-speakers/> (2022).
Han, J., Saravanapavanantham, M., Chua, M. R., Lang, J. H. & Bulovic, V. A versatile acoustically active surface based on piezoelectric microstructures. Microsyst. Nanoeng. 8, 55, https://doi.org/10.1038/s41378-022-00384-0 (2022).
Zhang, Q. et al. Flexible multifunctional platform based on piezoelectric acoustics for human-machine interaction and environmental perception. Microsyst. Nanoeng. 8, https://doi.org/10.1038/s41378-022-00402-1 (2022).
Acknowledgements
We thank the funding from National Key Research and Development Program (No. 2020YFB2008800). And we thank Quanning Li, Bohua Liu, Xuejiao Chen and Wenlan Guo of MEMS laboratory at Tianjin University for their help in fabricating our devices.
Author information
Authors and Affiliations
Contributions
M.S. proposed the theory, derived the formula, designed and simulated the device and wrote the manuscript. M.Z. and W.P. proposed the concept, directed the project and revised the manuscript. Y.W. simulated and fabricated the device. S.G conducted the experiments. C.S. developed the fabrication process. C.L.S. optimized the fabrication process. C.L. refined the formula. L.X. characterized the device.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sun, M., Zhang, M., Wang, Y. et al. Figure of merit for piezoelectric MEMS speakers. Microsyst Nanoeng 11, 138 (2025). https://doi.org/10.1038/s41378-025-00991-7
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41378-025-00991-7









