Main

Two-dimensional materials exhibit exceptional electronic properties even at monolayer thickness15,16,17, and the van der Waals heterostructure18,19 enables fine-tuning of the electronic bands20. These characteristics have enabled 2D electronics to extend the scaling ability beyond that of Si technology1,2,3 and create fundamental device mechanisms4,5,6. As one such demonstration of semiconductor devices for integrated circuits, 2D flash memory demonstrates Fowler–Nordheim tunnelling programming speed21,22,23,24 and channel length scaling25 advantages over Si flash memory (mainstream non-volatile memory technology). In recent years, the integration of 2D semiconductors has been increasingly considered by both industry8,26,27 and academic11,12 researchers of integrated circuits. The next stage of 2D electronics should demonstrate its superiority at the system level and accelerate the transition of emerging devices from lab to fab28,29. However, 2D semiconductors are currently unable to realize logic circuits comparable to those based on state-of-the-art Si technology. The combination of 2D electronics with mature Si CMOS logic circuits represents a promising way to demonstrate the superiority of 2D electronics at the system level. The pioneering works mainly combine 2D materials with CMOS to improve the function of cell performance, such as using an Si transistor to improve the reliability of 2D memristors14 or using graphene to broaden the spectral range of sensors30. The 2D electronics should further use the CMOS platform to construct superior systems with abilities beyond those of existing technologies.

The essential technology to migrate the advantages of the 2D device concept to the system is lacking, and developing such a systematic procedure and design methodology is extremely difficult. This should include a full-stack on-chip process from planar integration, 3D architecture, to chip packaging and a cross-platform system design to assimilate 2D electronics to the CMOS platform.

At present, the previous studies already show good progress of 2D device array on the highly flat SiO2/Si substrate (roughness <300 pm) (refs. 25,31). These off-chip processes cannot be directly transferred to the CMOS platform because the surface of the CMOS chip is significantly rougher (typical roughness of about 1–2 nm) even after chemical–mechanical polishing owing to the large variation of CMOS circuitry, as shown in Supplementary Fig. 1. The roughness variation will introduce random stress in 2D materials and uncontrolled air gaps in the interface32 and influence electrical characteristics of atomic thin layer 2D materials33,34,35,36. Other big questions of on-chip process are the 3D architecture and chip packaging. The emerging device mechanisms are generally driving incompatibility with the existing CMOS platform37,38, and a proper 3D architecture is needed to combine the 2D electronics with the CMOS platform. As the atomic thin 2D materials are also very sensitive to electro-thermo-mechanical (ETM) shock, leading the characteristics of 2D electronics to be easily influenced or damaged by traditional packaging process39, a lossless packaging process is needed.

More importantly, the cross-platform system design between 2D electronics and the CMOS platform is a completely empty space. It is essential to provide a simulation–verification methodology to make a cross-platform chip work. This system design must include both 2D circuit design and 2D-CMOS compatibility verification design, which is highly dependent on the intersection of emerging device or process and CMOS circuitry design researchers. Especially for the compatibility issues caused by emerging 2D electronics mechanisms, emerging mechanisms enable unprecedented performance breakthroughs but also distinguish themselves from the mature CMOS platform. Stubbornly solving these compatibility issues through device technology is not advisable, and converting them to circuit interface design to handle these difficult issues can introduce more system design tools.

In this study, we present the atomic device to chip (ATOM2CHIP) technology to address the challenges of 2D system integration at both the process and circuit design levels, demonstrating a fully functional memory chip through the integration of a 2D NOR flash module on a CMOS die. Leveraging a full-stack on-chip fabrication process, the resulting 2D flash chip achieves a high yield of 94.34%. The fabricated 2D flash cells feature 20-ns fast operation and low energy consumption down to 0.644 pJ per bit. Furthermore, the proposed cross-platform system design facilitates the functionality of the 2D NOR flash chip with instruction-driven operation, 32-bit parallelism and random access. This has been substantiated through chip testing, using a clock frequency of 5 MHz, and the programming pulse has been configured to 2.5 clock cycles. We believe that these system-level results represent an important milestone in extending the superiority of 2D electronics to real-world applications.

A 2D flash chip enabled by ATOM2CHIP technology

The proposed ATOM2CHIP blueprint is shown in Fig. 1a. The full-stack on-chip process developed in this study has produced a high yield of the 2D chip using the following processes: (1) a conformal adhesion process integrating 2D materials on rough CMOS die, with residual stress from rough surface mildly relieved; (2) a modular 3D architecture converting emerging device incompatibility to a well-designed 2D-CMOS module interface; and (3) a 2D-friendly packaging method with region-specific electrostatic discharge (ESD) protection and low thermal and strain budget packaging for ETM damage alleviation. The cross-platform system design has enabled complex chip functions as follows: (1) a crosstalk suppression 2D flash circuit design; (2) a CMOS voltage domain design compatible with negative and high voltage of 2D circuit operation mode; and (3) a 2D-aware CMOS impedance matching design for compatible drive and sense ability.

Fig. 1: A full-featured 2D flash memory chip enabled by ATOM2CHIP technology.
figure 1

a, The ATOM2CHIP blueprint for translating an atomic device concept into a tapeout-verified chip. b, The CMOS dies fabricated using a commercial 0.13 μm technology node. Left, an 8-inch wafer containing the fabricated CMOS dies; middle, optical image of the CMOS die; and right, functional descriptions of principal modules. More detailed information about the CMOS modules is provided in Supplementary Information section 1. c, Optical image of the 2D flash chip. The 2D flash module is integrated above the CMOS die and is connected by TGVs. d, STEM and HR-TEM images of the 2D flash chip. The STEM image confirms the integrated structure of the CMOS die and the 2D flash module. The HR-TEM images show the progressively magnified profiles of the 2D flash cell. Scale bars, 250 μm (c); 1 μm (d, left); 200 nm (d, top right); 5 nm (d, bottom right).

Using the ATOM2CHIP technology, we fabricated the 2D NOR flash chip, integrating 2D flash module on a mature CMOS platform. Figure 1b shows the optical image of the 8-inch CMOS wafer with a magnified view of the CMOS die. The CMOS dies are manufactured using a commercial 0.13 μm technology node, with multiple circuit modules integrated to handle peripheral control and manage memory operations. The principal circuit modules include an I/O for input/output (Supplementary Fig. 2), word line, bit line and source line (WL/BL/SL) buffers for WL/BL/SL driver circuits, a sense amplifier (SA) for data readout, a power switch for voltage domain control, a power-on reset (POR) circuit and a logic control circuit. Supplementary Figs. 37 provide the higher magnification optical images of the individual circuits and circuit schematics.

Figure 1c shows an optical image of our 1-Kb 2D NOR flash chip. The 2D flash module in NOR configuration is located in the central region of the CMOS die. There is a glass passivation (PA) layer for electrical isolation between the 2D module and CMOS circuits, as well as vias through the glass layer (TGVs) for the I/O interface (TGV1) and 2D-CMOS inter-module communication (TGV2). The 2D flash chip is controlled and tested by a host computer through 14 pads on TGV1 using serial communication based on the Serial Peripheral Interface protocol. All the WLs, BLs and SLs of the 2D flash module are connected to the CMOS circuitry using TGV2. The scanning transmission electron microscope (STEM) image of the fabricated chip (Fig. 1d, left) confirms the integrated structure of the 2D flash chip. The high-resolution transmission electron microscope (HR-TEM) images provide magnified views of the 2D flash cell, confirming the clean interfaces of the functional layers (Fig. 1d, right).

Full-stack on-chip process

The 2D flash module is integrated above a rough CMOS die (Fig. 2a, left) through back-end-of-line compatible integration. Figure 2a (middle) shows the overall 3D architecture of the 2D flash chip. The 2D flash module comprises the floating gate transistor cells, with monolayer MoS2 and HfO2/Pt/HfO2 serving as the channel material and the memory stack, respectively. To alleviate the contradiction between 2D electronics and the CMOS platform, a modular structure is proposed (Fig. 2a, right). The direct cell-to-cell integration of 2D flash cells with CMOS circuitry could introduce severe compatibility issues, stemming from the inherent mismatch in their operational modes. Our 2D flash memory core and CMOS platform are designed and fabricated separately as different function modules and connected through a specially designed 2D-CMOS module interface. Therefore, the compatibility issues can be effectively converted to interface design with the least adjustment in the planar integration process.

Fig. 2: The full-stack on-chip process.
figure 2

a, The 3D architecture of the fabricated 2D flash chip. Left, the CMOS die serves as the substrate, with a PA layer of 800 nm for isolation and TGVs for communication. Right, modular design for converting compatibility issues to the 2D-CMOS module interface design. b, Magnified optical micrograph of the CMOS die highlighting dense random circuit routing. Inset, corresponding atomic force microscopy (AFM) image with roughness RMS of 1.35 nm (amplitude range of 5 nm). c, AFM image of the 2D flash integrated on the CMOS die (amplitude range of 8 nm). The conformal adhesion of 2D materials to the rough CMOS die surface facilitates stress relief. d, Statistical results of memory window characterization of the 2D flash. The 2D flash cells fabricated by the conformal adhesion on-chip process exhibit compact, distinguishable Vth distributions for on–off states (red solid line, 60 cells extracted from Extended Data Fig. 1a). The non-ideal behaviour, caused by yield and uniformity limitations, exhibits a broader distribution with overlap (blue dashed line). e, Schematic of the comprehensive protection in the 2D-friendly packaging. Left, region-specific ESD protection. ESD1 for WL/BL/SL, ESD2 for power/ground, ESD3 for inputs and ESD4 for outputs. The hatched areas denote the internal circuit associated with the corresponding pads. Top right, comparison of 2D specialized ultrasonic wire bonding with low thermal and strain budget (right) to conventional thermocompression approach with high thermal and strain budget (left). Bottom right, room temperature (RT) curing in a die attachment process. Scale bar, 5 μm (b,c). VDD, high power supply voltage; VSS, low power supply voltage.

Source data

The planar integration aims to tackle the yield loss from rough CMOS die. The dense and random routing of CMOS modules produces surface morphology variations with a root mean square (RMS) roughness of 1.35 nm after chemical–mechanical polishing (Fig. 2b), inducing random stress in atomic thin MoS2 and reducing the yield and uniformity of integrated 2D flash devices. To alleviate these stresses, we developed a conformal adhesion on-chip integration process, with gradual-release transfer and multi-step, multi-scale annealing (details provided in the Methods). The AFM image in Fig. 2c confirms the conformal adhesion of 2D materials on the rough CMOS die, thereby facilitating stable channel performance and dielectric environment. Supplementary Information section 2 provides more characterizations of the process. Figure 2d shows the tight and clearly separated threshold voltage (Vth) distributions for devices fabricated with our conformal adhesion on-chip planar process, compared with the non-ideal behaviour.

Electronic packaging is essential for chip-scale integration, yet 2D chip packaging remains underexplored. As sensitive 2D materials can be damaged by ESD, high temperature and mechanical stress in the packaging process, we developed a 2D-friendly packaging strategy (Fig. 2e) that delivers comprehensive protection. First, region-specific ESD protection is implemented for all pads (Fig. 2e, left). According to protection requirements, four types of ESD circuit (ESD1–4) were designed and positioned alongside the WL/BL/SL mini pads, power/ground, input pads and output pads, respectively. Second, ultrasonic bonding specialized for 2D materials is conducted at room temperature at low pressure (Fig. 2e, top right). This decreases the thermal and stress budgets and reduces post-bond leakage of the 2D circuit by more than tenfold to less than 1 pA (Supplementary Fig. 12). Third, the adhesive that cures at room temperature is used for die attachment (Fig. 2e, bottom right), which further minimizes the thermal damage. Moreover, a photoresist layer encapsulation is adopted to protect the chip against environmental degradation (Supplementary Fig. 13). Supplementary Information section 3 summarizes the detailed packaging considerations and protection effect. Moreover, a comparison between the function of CMOS modules before and after the integration of the 2D flash module demonstrates that the full-stack on-chip process is back-end-of-line compatible and would not damage the CMOS modules (Supplementary Fig. 14).

Extended Data Fig. 1 shows the outstanding performance of 2D flash cells. More than 1,000 devices were tested to verify the lossless full-stack on-chip process with high uniformity. The 2D flash cells support fast programming and erasing with 20 ns and low energy consumption, evaluated to be 0.644 pJ per bit. Extended Data Fig. 2 shows the good retention performance of the 10-year non-volatile at 54.8 °C. Endurance and read disturb tolerance have been proven to be more than 104 and 106 cycles, respectively. Supplementary Information section 5 discusses more details on 2D flash performance.

Cross-platform system design

Figure 3a shows a cross-platform compatibility verification methodology that we proposed to make all the modules work together. This methodology begins with the design of the 2D flash module. As the slow voltage settling of NAND limits its programming speed, we use NOR architecture to realize fast operation. High-speed operation modes that inhibit crosstalk are designed on the basis of the fast Fowler–Nordheim mechanism. The device and circuit parameters are then extracted. Based on the operation mode and the extracted impedance parameters, the CMOS modules are designed to ensure compatibility with the 2D flash. Finally, the cross-platform system is validated by a comprehensive simulation.

Fig. 3: The cross-platform compatibility verification methodology.
figure 3

a, Schematic showing the 2D module design and 2D-compatible CMOS modules design for realizing a 2D flash memory chip. b, The Si device design in the power switch module for voltage domain compatibility with 2D flash. The isolation ring decouples source–drain from the p-substrate, allowing local negative voltage biasing. A supplemental buried N-well improves voltage tolerance for 2D flash operation. c, The 2D compatible inverter chain design within the buffer modules. Stage count and driver ratio were optimized on the basis of 2D flash load capacitance and CMOS inverter input capacitance. The output waveforms under different driver abilities were simulated by adjusting the transistor W/L ratio in the final inverter. d, Sense amplifier design optimization and readout characterization. Data sequence ‘0101’ across four WLs is simulated for reading. The BL parasitic capacitance leads to misreading of SA1 (for details, see Extended Data Fig. 4). SA2 achieves correct reading by isolating the BL parasitic capacitance and further improves readout speed by reducing the load of the readout circuit (for details, see Extended Data Fig. 5). e, Timing diagram of programming operation. The operation instructions include 8-bit commands (06H, 02H, where H represents hexadecimal), address and 4 data bytes. WL[22] is accessed for programming, and 32-bit input data is programmed in parallel to WL[22]. CS, chip select signal; SPI_SCLK, serial clock of the Serial Peripheral Interface protocol; SPI_SI, serial data input of the SPI protocol; addr, address; din, data input; clk, clock; GND, ground.

Source data

Extended Data Fig. 3 shows the crosstalk suppression design of the 2D NOR flash circuit with a half-selected scheme. Crosstalk tests in many different scales, including single device, 4 × 4 array and 4 × 32 array, demonstrate slight mean Vth shifts of 0.024 V and −0.006 V for programming and erasing crosstalk, respectively. The crosstalk of a 2D flash cell subjected to consecutive crosstalk pulses was also examined, demonstrating good crosstalk suppression ability. Supplementary Table 3 summarizes the operation mode with the half-selected scheme. Furthermore, the impedance parameter of the 2D flash module is extracted for designing 2D-compatible CMOS modules (Supplementary Table 4).

The maximum voltage drop across the 2D-CMOS interface modules can be reduced to 7 V with the half-selected scheme. This helps to avoid unintended breakdown due to the high voltage in the interface modules, and the complex charge pump design may also be eliminated. Meanwhile, the negative voltage required for the 2D flash module can increase the risk of forward-biasing parasitic PN junctions in CMOS circuitry, thereby inducing huge leakage current. Therefore, the isolated devices are designed for interface modules, such as power switch, to meet the voltage requirements. As Fig. 3b shows, the isolated NMOS transistor incorporates an isolation (ISO) ring and a deep N-well to separate the device P-well (body) from the global P-well (substrate), enabling local negative-voltage application. The ISO ring is biased at VMAX—the highest potential relative to adjacent regions—to prevent forward biasing of parasitic PN junctions. The buried N-well in the isolated device further enhances the electrical isolation and suppresses latch-up, thereby increasing the voltage tolerance.

To ensure the 2D flash chip functions, the WL/BL/SL buffers and SA must be designed to match the impedance of the 2D flash module for voltage waveform output and data readout. As shown in Fig. 3c, the inverter chain incorporated within the buffer modules was engineered with the logical effort technique to match the load (WL capacitance) and minimize signal propagation delay for fast waveform generation (Methods). The ability of the driver is substantially improved with impedance matching (Fig. 3c, right). Supplementary Information section 7 demonstrates the proper function of buffer modules. Figure 3d shows the SA design optimization for accurate and fast data readout, validated by simulating a ‘0101’ data sequence readout from four cells across four WLs. By isolating BL capacitance and reducing load capacitance, SA2 (with 2D-compatible design; Extended Data Fig. 5) reduces the reading time by 70% and achieves correct readout compared with SA1 (non-compatible design; Extended Data Fig. 4).

Simulation verification was performed, covering the programming, erasing and reading operation modes. Figure 3e shows the timing diagram for internal command and data transmission during the programming operation. The programming instruction includes two 8-bit command bytes, an address byte and 4 data bytes. WL[22] is addressed, and voltages are applied to 32 bits on WL[22] concurrently, achieving parallel programming. Extended Data Fig. 6 provides the timing diagrams of erasing and reading operations. These verification results confirm that the 2D flash can support instruction-driven operations, up to 32-bit parallelism and random-access ability.

Function demonstration based on full-chip test

Figure 4a shows the functional testing of the fabricated 2D NOR flash chip using a dedicated chip test system. The host computer provides a software interface and loads the test program onto the field-programmable gate array (FPGA), which then transmits the instructions to the 2D flash chip. The arbitrary waveform generator (AWG) and d.c. power supply provide the necessary clock and d.c. signals, respectively. Figure 4b shows the data flow of the 2D flash chip. When the power supply is activated, the POR circuit gives the reset bar (rstb) signal and enables the chip for normal operation. External instructions are conveyed to the logic module by the I/O module, generating three types of signal: control logic signals, address signal and data signal. Following these instructions, the power switch module adjusts the required voltage domain to each buffer, depending on the specific operation modes. The voltage pulses are then applied to the corresponding ports of the memory array through the WL/BL/SL buffers, completing the desired operation.

Fig. 4: Full function demonstration based on full-chip test.
figure 4

a, Schematic of the chip test system. The AWG and d.c. power supply provide the required external clock signals (OSC) and d.c. signals, respectively. The FPGA transmits the command and data between the host computer and the I/O ports of the 2D flash chip, including the CS, SCLK, serial data input (SI) and serial data output (SO). The oscilloscope monitors pulse waveforms generated by the AWG. b, Data flow of the 2D flash chip. Modules are labelled in rectangular boxes, whereas the flow of key signals is indicated by arrows. c, Histogram of the programming accuracy across the 32 WLs after checkerboard programming. About 93.55% of cells reach the target states corresponding to the checkerboard pattern. dout, data output; rd_clk, read clock; rdbl, read bit line.

Source data

Full-chip programming and erasing tests were performed under a 5 MHz clock with a 500-ns operation pulse (one pulse lasts for 2.5 clock cycles) to ensure reliable operation, as discussed in Supplementary Information section 8. The results are summarized in Supplementary Table 5, showing an overall yield of 94.34%. A failure analysis (Supplementary Information section 9) showed that operational failures were primarily caused by process issues, which led to channel cracks and Vth variations. Our yield marks a marked advance in the integration of 2D electronics above the 1-Kb scale of the on-chip process11,40,41. Moreover, the International Technology Roadmap for Semiconductors requires a yield of approximately 89.5% in flash manufacturing42, so further optimization of our chip is expected to lead to practical applications.

As a more complex chip-level function demonstration, the test of programming a checkerboard pattern (a pattern of alternating state-0 and state-1) was performed. Supplementary Table 6 provides the datasheet of the memory states before and after the checkerboard programming. Figure 4c shows the programming accuracy of each row. Approximately 93.55% of the cells achieved the correct states corresponding to the checkerboard pattern. Only three cells were unintentionally programmed, confirming effective crosstalk suppression design. Supplementary Log Data provides the original log file generated during the chip testing process. Supplementary Video shows the process of chip checkerboard programming validation by the host computer.

The full-chip test yielded the following peripheral circuitry average supply current at the maximum parallelism: programming, 1.04 mA; erasing, 1.25 mA; and reading, 1.14 mA, corresponding to the power consumption of 5.2 mW, 6.25 mW and 5.7 mW, respectively. These are close to commercial standalone NOR flashes with similar technology nodes43,44,45. Moreover, advanced embedded NOR flash with systematic energy consumption optimization effectively reduces energy consumption from peripheral circuits, making cell programming energy the dominant factor46. The 2D flash with a low programming energy consumption of 0.644 pJ per bit has great potential in advanced embedded applications. Supplementary Information section 10 provides a comprehensive comparison between 2D flash and Si flash. Supplementary Information section 11 discusses the scalability of the 2D flash chip. Notably, as the current NAND and NOR architectures are designed for silicon flash cells, further expansion of the speed and energy consumption advantages of 2D flash from the device to the system level requires an innovation in memory architecture that is tailored to the mechanisms of 2D devices.

Conclusion

We have demonstrated a full-featured 2D NOR flash chip using the ATOM2CHIP technology. The full-stack on-chip process ensures a high yield of 94.34% by addressing random stress resulting from random roughness of the CMOS circuitry and damage from conventional chip packaging. The fabricated 2D flash cells support 20-ns fast operation and 0.644-pJ per bit low energy consumption. The proposed cross-platform system design provides a methodology to ensure compatibility between 2D electronics using emerging mechanisms and the mature CMOS platform. The 2D NOR flash chip is demonstrated to be capable of instruction-driven operation, 32-bit parallelism and random access using a 5-MHz clock. This work provides a promising technical pathway to bring promising 2D electronics concepts to real-world applications.

Methods

Flash chip fabrication

The CMOS circuitry was fabricated in a standard CMOS foundry using a 0.13-μm process. The received 8-inch wafer had a passivation layer thickness of approximately 800 nm, with pre-reserved vias at the port pads of I/O (TGV1 region) and WL/BL/SL buffers (TGV2 region). The wafer was cut into individual dies, each with a dimension of 5 mm × 5 mm (four sets of identical circuits included). Polymer-mediated delamination treatments were performed on the CMOS substrate before integrating 2D flash. The CMOS substrate was cleaned by soaking in acetone for 12 h, followed by spin-coating with photoresist (S1818) and removal of the photoresist using N-methyl-2-pyrrolidone (NMP) soak for 12 h.

Direct-write lithography was used to expose windows at the TGV2 region, and e-beam evaporation (EBE) was used to fill the vias with 5/500 nm Cr/Au. WLs were defined using direct-write lithography, followed by the deposition of 5/100/5 nm Cr/Au/Pt. The O2 plasma treatment (50 W, 20 s) was used to further clean and activate the surface for dielectric deposition. A 13-nm HfO2 blocking layer was deposited using thermal atomic layer deposition. Tetrakis(ethylmethylamino)hafnium reacts with water at 150 °C to form HfO2. The floating gate pattern was defined by direct-write lithography, and 3-nm Pt was deposited by EBE. The O2 plasma treatment was performed again. Subsequently, a 7-nm HfO2 tunnelling layer was deposited using the same atomic layer deposition system. Vias through the HfO2/Pt/HfO2 memory stack were defined by direct-write lithography and etched using reactive ion etching (Ar + CHF3, 175 W, 255 s), and EBE was then used to deposit a 5/50 nm Cr/Au layer to fill the vias. Chemical vapour deposition monolayer MoS2 (purchased from Sixcarbon Technology) was transferred onto the memory stack using a gradual-release transfer process. The minimum approach speed between MoS2 and the substrate is carefully controlled to be as low as 500 nm per step using the custom-made transfer equipment. Polystyrene was used as the supporting layer because of its large Young’s modulus to avoid wrinkling. The polystyrene supporting layer was removed by soaking in toluene for 12 h. The MoS2 channels were patterned by direct-write lithography and etched by O2 plasma (30 W, 20 s). The sample was soaked in NMP for 12 h to remove the photoresist. To fully release stress and air gaps in MoS2, multiple annealing processes in an N2 atmosphere (200 °C, 3 h) were performed for both large-area films and patterned strips. The adhesion between MoS2 and the substrate can also be enhanced during these processes. BLs and SLs were defined by direct-write lithography, followed by the deposition of 5/100 nm Cr/Au using EBE. For the fabrication of the 2D flash on a SiO2/Si substrate, the process involving the vias mentioned above is not required.

To passivate the 2D flash module, a layer of S1818 photoresist was spin-coated onto the sample. The TGV1 region of the I/O module was exposed by direct-write lithography for wire bonding. The chip was packaged using a ceramic dual-in-line package (DIP 24).

Inverter chain design of the buffer module

According to the logical effort theory, the total logical effort, determined by the ratio of the load capacitance (10 pF in our case, considering design margin) to the inherent input capacitance of the first-stage CMOS inverter (2 fF, decided by selected CMOS technology), should be distributed across a chosen number of inverter stages for an optimized propagation delay time. The propagation delay time of the inverter chain in the buffer can be calculated by

$${t}_{{\rm{p}}}={t}_{{\rm{p}}0}\mathop{\sum }\limits_{j=1}^{N}\left(1+\frac{{C}_{{\rm{g}},j+1}}{\gamma {C}_{{\rm{g}},j}}\right)$$
(1)

where N is the number of stages of the inverter chain, Cg,j is the gate capacitance for the jth inverter, Cg,N+1 is defined as the capacitance load, here parasitic capacitance of the 2D memory array, tp0 is the intrinsic delay for the inverter and γ is a parameter dependent on the process, usually near 1.

For an optimized design, the gate capacitance (and the inverter size) should be the geometric mean of the adjacent inverters, such that

$${C}_{{\rm{g}},j}=\sqrt{{C}_{{\rm{g}},j-1}{C}_{{\rm{g}},j+1}},{\rm{where}}\;j=2,\ldots ,N$$
(2)

and the optimized propagation delay time can be written as

$${t}_{{\rm{p}}}=N{t}_{{\rm{p}}0}\left(1+\sqrt[N]{\frac{{C}_{{\rm{g}},N+1}}{{C}_{{\rm{g}},1}}}/\gamma \right)$$
(3)

Usually, Cg,1 is the minimum inverter gate capacitance for a certain process—in our work, 2 fF—and Cg,N+1 is 10 pF. Therefore, the optimized N for the inverter chain is 6 with a propagation delay of about 27.3tp0, whereas N = 4 is sufficient with a delay of around 30.7tp0 and offers benefits related to buffer size. For an inverter of each stage, the driver ratio is \(\sqrt[N]{\frac{{C}_{{\rm{g}},N+1}}{{C}_{{\rm{g}},1}}}\approx 8\), and the optimized driver chain is designed as shown in Fig. 3c.

Material characterization

The TEM-ready samples were prepared using the in situ FIB lift-out technique on an FEI Strata G4 HX dual-beam FIB scanning electron microscope. The samples were capped with sputtered electron-beam Pt and ion-beam Pt before milling. STEM and TEM images were captured with the Thermo Scientific Tecnai Z aberration-corrected transmission electron microscope at an accelerating voltage of 200 kV. Energy-dispersive spectra were obtained in STEM mode using a Super X FEI system. The AFM images of the devices were measured by an MFP-3D Origin+ (Asylum Research, Oxford Instruments) system. Optical images were captured by an optical microscope (OLYMPUS BX53M) and an extended-DOF microscope (KEYENCE VHX-6000).

Electrical measurements

The electrical characterization of the standalone 2D flash devices and the 4 × 32 array was carried out at room temperature and under atmospheric conditions (except the retention test) in a probe station (Cascade Summit 11000 type). The retention test was conducted in a customized vacuum probe station. The voltage pulses were generated using a semiconductor parameter analyser (B1500, Keysight). The waveform was captured using an oscilloscope (DPO 5204, Tektronix).

The electrical characterization of the 2D flash chip was performed with a dedicated chip test system. The arbitrary waveform generator (33120 A, Agilent) provides clock signals, monitored by an oscilloscope (DSOX1204A, Keysight). The d.c. power supply (E36312A, Keysight) provides d.c. signals required for testing the chip, including −1 V, −5 V, 2 V, 3 V, 5 V and 9 V. The host computer provides a software interface and loads the test program onto the FPGA. FPGA transmits the command from the host computer to the I/O ports of the 2D flash chip. The packaged 2D flash chip was placed into a test socket compatible with the DIP package before testing.