Abstract
Two-dimensional (2D) materials have extended the device scalability1,2,3 of silicon (Si) technology and enabled fundamental innovations in device mechanisms4,5,6. Both industry7,8,9 and academia10,11,12,13, particularly in the field of integrated circuits, are pursuing integration breakthroughs to demonstrate the superiority of 2D electronics at the system level. Despite considerable integration progress on either 2D material integration11,12,13 or 2D-CMOS hybrid integration14, a system that can migrate the advantages of the device to the application is still lacking. Here we report a full-featured 2D NOR flash memory chip realized by an atomic device to chip (ATOM2CHIP) technology, which combines a superior 2D electronic device as a memory core and a powerful CMOS platform to support complex instruction control. The ATOM2CHIP blueprint includes a full-stack on-chip process and a cross-platform system design, providing a complete framework to bridge the gap from emerging device concept to an applicable chip. The full-stack on-chip process is a specially designed flow that incorporates planar integration, three-dimensional (3D) architecture and chip packaging, contributing to a high yield of 94.34% based on a full-chip test. The cross-platform system design handles both the 2D circuit design and the 2D-CMOS modules compatibility verification design, contributing to a highly complex, instruction-driven, full-featured chip with 8-bit commands and 32-bit parallelism. These results demonstrate an efficient system integration strategy that showcases the advantages of the 2D electronic system.
Main
Two-dimensional materials exhibit exceptional electronic properties even at monolayer thickness15,16,17, and the van der Waals heterostructure18,19 enables fine-tuning of the electronic bands20. These characteristics have enabled 2D electronics to extend the scaling ability beyond that of Si technology1,2,3 and create fundamental device mechanisms4,5,6. As one such demonstration of semiconductor devices for integrated circuits, 2D flash memory demonstrates Fowler–Nordheim tunnelling programming speed21,22,23,24 and channel length scaling25 advantages over Si flash memory (mainstream non-volatile memory technology). In recent years, the integration of 2D semiconductors has been increasingly considered by both industry8,26,27 and academic11,12 researchers of integrated circuits. The next stage of 2D electronics should demonstrate its superiority at the system level and accelerate the transition of emerging devices from lab to fab28,29. However, 2D semiconductors are currently unable to realize logic circuits comparable to those based on state-of-the-art Si technology. The combination of 2D electronics with mature Si CMOS logic circuits represents a promising way to demonstrate the superiority of 2D electronics at the system level. The pioneering works mainly combine 2D materials with CMOS to improve the function of cell performance, such as using an Si transistor to improve the reliability of 2D memristors14 or using graphene to broaden the spectral range of sensors30. The 2D electronics should further use the CMOS platform to construct superior systems with abilities beyond those of existing technologies.
The essential technology to migrate the advantages of the 2D device concept to the system is lacking, and developing such a systematic procedure and design methodology is extremely difficult. This should include a full-stack on-chip process from planar integration, 3D architecture, to chip packaging and a cross-platform system design to assimilate 2D electronics to the CMOS platform.
At present, the previous studies already show good progress of 2D device array on the highly flat SiO2/Si substrate (roughness <300 pm) (refs. 25,31). These off-chip processes cannot be directly transferred to the CMOS platform because the surface of the CMOS chip is significantly rougher (typical roughness of about 1–2 nm) even after chemical–mechanical polishing owing to the large variation of CMOS circuitry, as shown in Supplementary Fig. 1. The roughness variation will introduce random stress in 2D materials and uncontrolled air gaps in the interface32 and influence electrical characteristics of atomic thin layer 2D materials33,34,35,36. Other big questions of on-chip process are the 3D architecture and chip packaging. The emerging device mechanisms are generally driving incompatibility with the existing CMOS platform37,38, and a proper 3D architecture is needed to combine the 2D electronics with the CMOS platform. As the atomic thin 2D materials are also very sensitive to electro-thermo-mechanical (ETM) shock, leading the characteristics of 2D electronics to be easily influenced or damaged by traditional packaging process39, a lossless packaging process is needed.
More importantly, the cross-platform system design between 2D electronics and the CMOS platform is a completely empty space. It is essential to provide a simulation–verification methodology to make a cross-platform chip work. This system design must include both 2D circuit design and 2D-CMOS compatibility verification design, which is highly dependent on the intersection of emerging device or process and CMOS circuitry design researchers. Especially for the compatibility issues caused by emerging 2D electronics mechanisms, emerging mechanisms enable unprecedented performance breakthroughs but also distinguish themselves from the mature CMOS platform. Stubbornly solving these compatibility issues through device technology is not advisable, and converting them to circuit interface design to handle these difficult issues can introduce more system design tools.
In this study, we present the atomic device to chip (ATOM2CHIP) technology to address the challenges of 2D system integration at both the process and circuit design levels, demonstrating a fully functional memory chip through the integration of a 2D NOR flash module on a CMOS die. Leveraging a full-stack on-chip fabrication process, the resulting 2D flash chip achieves a high yield of 94.34%. The fabricated 2D flash cells feature 20-ns fast operation and low energy consumption down to 0.644 pJ per bit. Furthermore, the proposed cross-platform system design facilitates the functionality of the 2D NOR flash chip with instruction-driven operation, 32-bit parallelism and random access. This has been substantiated through chip testing, using a clock frequency of 5 MHz, and the programming pulse has been configured to 2.5 clock cycles. We believe that these system-level results represent an important milestone in extending the superiority of 2D electronics to real-world applications.
A 2D flash chip enabled by ATOM2CHIP technology
The proposed ATOM2CHIP blueprint is shown in Fig. 1a. The full-stack on-chip process developed in this study has produced a high yield of the 2D chip using the following processes: (1) a conformal adhesion process integrating 2D materials on rough CMOS die, with residual stress from rough surface mildly relieved; (2) a modular 3D architecture converting emerging device incompatibility to a well-designed 2D-CMOS module interface; and (3) a 2D-friendly packaging method with region-specific electrostatic discharge (ESD) protection and low thermal and strain budget packaging for ETM damage alleviation. The cross-platform system design has enabled complex chip functions as follows: (1) a crosstalk suppression 2D flash circuit design; (2) a CMOS voltage domain design compatible with negative and high voltage of 2D circuit operation mode; and (3) a 2D-aware CMOS impedance matching design for compatible drive and sense ability.
a, The ATOM2CHIP blueprint for translating an atomic device concept into a tapeout-verified chip. b, The CMOS dies fabricated using a commercial 0.13 μm technology node. Left, an 8-inch wafer containing the fabricated CMOS dies; middle, optical image of the CMOS die; and right, functional descriptions of principal modules. More detailed information about the CMOS modules is provided in Supplementary Information section 1. c, Optical image of the 2D flash chip. The 2D flash module is integrated above the CMOS die and is connected by TGVs. d, STEM and HR-TEM images of the 2D flash chip. The STEM image confirms the integrated structure of the CMOS die and the 2D flash module. The HR-TEM images show the progressively magnified profiles of the 2D flash cell. Scale bars, 250 μm (c); 1 μm (d, left); 200 nm (d, top right); 5 nm (d, bottom right).
Using the ATOM2CHIP technology, we fabricated the 2D NOR flash chip, integrating 2D flash module on a mature CMOS platform. Figure 1b shows the optical image of the 8-inch CMOS wafer with a magnified view of the CMOS die. The CMOS dies are manufactured using a commercial 0.13 μm technology node, with multiple circuit modules integrated to handle peripheral control and manage memory operations. The principal circuit modules include an I/O for input/output (Supplementary Fig. 2), word line, bit line and source line (WL/BL/SL) buffers for WL/BL/SL driver circuits, a sense amplifier (SA) for data readout, a power switch for voltage domain control, a power-on reset (POR) circuit and a logic control circuit. Supplementary Figs. 3–7 provide the higher magnification optical images of the individual circuits and circuit schematics.
Figure 1c shows an optical image of our 1-Kb 2D NOR flash chip. The 2D flash module in NOR configuration is located in the central region of the CMOS die. There is a glass passivation (PA) layer for electrical isolation between the 2D module and CMOS circuits, as well as vias through the glass layer (TGVs) for the I/O interface (TGV1) and 2D-CMOS inter-module communication (TGV2). The 2D flash chip is controlled and tested by a host computer through 14 pads on TGV1 using serial communication based on the Serial Peripheral Interface protocol. All the WLs, BLs and SLs of the 2D flash module are connected to the CMOS circuitry using TGV2. The scanning transmission electron microscope (STEM) image of the fabricated chip (Fig. 1d, left) confirms the integrated structure of the 2D flash chip. The high-resolution transmission electron microscope (HR-TEM) images provide magnified views of the 2D flash cell, confirming the clean interfaces of the functional layers (Fig. 1d, right).
Full-stack on-chip process
The 2D flash module is integrated above a rough CMOS die (Fig. 2a, left) through back-end-of-line compatible integration. Figure 2a (middle) shows the overall 3D architecture of the 2D flash chip. The 2D flash module comprises the floating gate transistor cells, with monolayer MoS2 and HfO2/Pt/HfO2 serving as the channel material and the memory stack, respectively. To alleviate the contradiction between 2D electronics and the CMOS platform, a modular structure is proposed (Fig. 2a, right). The direct cell-to-cell integration of 2D flash cells with CMOS circuitry could introduce severe compatibility issues, stemming from the inherent mismatch in their operational modes. Our 2D flash memory core and CMOS platform are designed and fabricated separately as different function modules and connected through a specially designed 2D-CMOS module interface. Therefore, the compatibility issues can be effectively converted to interface design with the least adjustment in the planar integration process.
a, The 3D architecture of the fabricated 2D flash chip. Left, the CMOS die serves as the substrate, with a PA layer of 800 nm for isolation and TGVs for communication. Right, modular design for converting compatibility issues to the 2D-CMOS module interface design. b, Magnified optical micrograph of the CMOS die highlighting dense random circuit routing. Inset, corresponding atomic force microscopy (AFM) image with roughness RMS of 1.35 nm (amplitude range of 5 nm). c, AFM image of the 2D flash integrated on the CMOS die (amplitude range of 8 nm). The conformal adhesion of 2D materials to the rough CMOS die surface facilitates stress relief. d, Statistical results of memory window characterization of the 2D flash. The 2D flash cells fabricated by the conformal adhesion on-chip process exhibit compact, distinguishable Vth distributions for on–off states (red solid line, 60 cells extracted from Extended Data Fig. 1a). The non-ideal behaviour, caused by yield and uniformity limitations, exhibits a broader distribution with overlap (blue dashed line). e, Schematic of the comprehensive protection in the 2D-friendly packaging. Left, region-specific ESD protection. ESD1 for WL/BL/SL, ESD2 for power/ground, ESD3 for inputs and ESD4 for outputs. The hatched areas denote the internal circuit associated with the corresponding pads. Top right, comparison of 2D specialized ultrasonic wire bonding with low thermal and strain budget (right) to conventional thermocompression approach with high thermal and strain budget (left). Bottom right, room temperature (RT) curing in a die attachment process. Scale bar, 5 μm (b,c). VDD, high power supply voltage; VSS, low power supply voltage.
The planar integration aims to tackle the yield loss from rough CMOS die. The dense and random routing of CMOS modules produces surface morphology variations with a root mean square (RMS) roughness of 1.35 nm after chemical–mechanical polishing (Fig. 2b), inducing random stress in atomic thin MoS2 and reducing the yield and uniformity of integrated 2D flash devices. To alleviate these stresses, we developed a conformal adhesion on-chip integration process, with gradual-release transfer and multi-step, multi-scale annealing (details provided in the Methods). The AFM image in Fig. 2c confirms the conformal adhesion of 2D materials on the rough CMOS die, thereby facilitating stable channel performance and dielectric environment. Supplementary Information section 2 provides more characterizations of the process. Figure 2d shows the tight and clearly separated threshold voltage (Vth) distributions for devices fabricated with our conformal adhesion on-chip planar process, compared with the non-ideal behaviour.
Electronic packaging is essential for chip-scale integration, yet 2D chip packaging remains underexplored. As sensitive 2D materials can be damaged by ESD, high temperature and mechanical stress in the packaging process, we developed a 2D-friendly packaging strategy (Fig. 2e) that delivers comprehensive protection. First, region-specific ESD protection is implemented for all pads (Fig. 2e, left). According to protection requirements, four types of ESD circuit (ESD1–4) were designed and positioned alongside the WL/BL/SL mini pads, power/ground, input pads and output pads, respectively. Second, ultrasonic bonding specialized for 2D materials is conducted at room temperature at low pressure (Fig. 2e, top right). This decreases the thermal and stress budgets and reduces post-bond leakage of the 2D circuit by more than tenfold to less than 1 pA (Supplementary Fig. 12). Third, the adhesive that cures at room temperature is used for die attachment (Fig. 2e, bottom right), which further minimizes the thermal damage. Moreover, a photoresist layer encapsulation is adopted to protect the chip against environmental degradation (Supplementary Fig. 13). Supplementary Information section 3 summarizes the detailed packaging considerations and protection effect. Moreover, a comparison between the function of CMOS modules before and after the integration of the 2D flash module demonstrates that the full-stack on-chip process is back-end-of-line compatible and would not damage the CMOS modules (Supplementary Fig. 14).
Extended Data Fig. 1 shows the outstanding performance of 2D flash cells. More than 1,000 devices were tested to verify the lossless full-stack on-chip process with high uniformity. The 2D flash cells support fast programming and erasing with 20 ns and low energy consumption, evaluated to be 0.644 pJ per bit. Extended Data Fig. 2 shows the good retention performance of the 10-year non-volatile at 54.8 °C. Endurance and read disturb tolerance have been proven to be more than 104 and 106 cycles, respectively. Supplementary Information section 5 discusses more details on 2D flash performance.
Cross-platform system design
Figure 3a shows a cross-platform compatibility verification methodology that we proposed to make all the modules work together. This methodology begins with the design of the 2D flash module. As the slow voltage settling of NAND limits its programming speed, we use NOR architecture to realize fast operation. High-speed operation modes that inhibit crosstalk are designed on the basis of the fast Fowler–Nordheim mechanism. The device and circuit parameters are then extracted. Based on the operation mode and the extracted impedance parameters, the CMOS modules are designed to ensure compatibility with the 2D flash. Finally, the cross-platform system is validated by a comprehensive simulation.
a, Schematic showing the 2D module design and 2D-compatible CMOS modules design for realizing a 2D flash memory chip. b, The Si device design in the power switch module for voltage domain compatibility with 2D flash. The isolation ring decouples source–drain from the p-substrate, allowing local negative voltage biasing. A supplemental buried N-well improves voltage tolerance for 2D flash operation. c, The 2D compatible inverter chain design within the buffer modules. Stage count and driver ratio were optimized on the basis of 2D flash load capacitance and CMOS inverter input capacitance. The output waveforms under different driver abilities were simulated by adjusting the transistor W/L ratio in the final inverter. d, Sense amplifier design optimization and readout characterization. Data sequence ‘0101’ across four WLs is simulated for reading. The BL parasitic capacitance leads to misreading of SA1 (for details, see Extended Data Fig. 4). SA2 achieves correct reading by isolating the BL parasitic capacitance and further improves readout speed by reducing the load of the readout circuit (for details, see Extended Data Fig. 5). e, Timing diagram of programming operation. The operation instructions include 8-bit commands (06H, 02H, where H represents hexadecimal), address and 4 data bytes. WL[22] is accessed for programming, and 32-bit input data is programmed in parallel to WL[22]. CS, chip select signal; SPI_SCLK, serial clock of the Serial Peripheral Interface protocol; SPI_SI, serial data input of the SPI protocol; addr, address; din, data input; clk, clock; GND, ground.
Extended Data Fig. 3 shows the crosstalk suppression design of the 2D NOR flash circuit with a half-selected scheme. Crosstalk tests in many different scales, including single device, 4 × 4 array and 4 × 32 array, demonstrate slight mean Vth shifts of 0.024 V and −0.006 V for programming and erasing crosstalk, respectively. The crosstalk of a 2D flash cell subjected to consecutive crosstalk pulses was also examined, demonstrating good crosstalk suppression ability. Supplementary Table 3 summarizes the operation mode with the half-selected scheme. Furthermore, the impedance parameter of the 2D flash module is extracted for designing 2D-compatible CMOS modules (Supplementary Table 4).
The maximum voltage drop across the 2D-CMOS interface modules can be reduced to 7 V with the half-selected scheme. This helps to avoid unintended breakdown due to the high voltage in the interface modules, and the complex charge pump design may also be eliminated. Meanwhile, the negative voltage required for the 2D flash module can increase the risk of forward-biasing parasitic PN junctions in CMOS circuitry, thereby inducing huge leakage current. Therefore, the isolated devices are designed for interface modules, such as power switch, to meet the voltage requirements. As Fig. 3b shows, the isolated NMOS transistor incorporates an isolation (ISO) ring and a deep N-well to separate the device P-well (body) from the global P-well (substrate), enabling local negative-voltage application. The ISO ring is biased at VMAX—the highest potential relative to adjacent regions—to prevent forward biasing of parasitic PN junctions. The buried N-well in the isolated device further enhances the electrical isolation and suppresses latch-up, thereby increasing the voltage tolerance.
To ensure the 2D flash chip functions, the WL/BL/SL buffers and SA must be designed to match the impedance of the 2D flash module for voltage waveform output and data readout. As shown in Fig. 3c, the inverter chain incorporated within the buffer modules was engineered with the logical effort technique to match the load (WL capacitance) and minimize signal propagation delay for fast waveform generation (Methods). The ability of the driver is substantially improved with impedance matching (Fig. 3c, right). Supplementary Information section 7 demonstrates the proper function of buffer modules. Figure 3d shows the SA design optimization for accurate and fast data readout, validated by simulating a ‘0101’ data sequence readout from four cells across four WLs. By isolating BL capacitance and reducing load capacitance, SA2 (with 2D-compatible design; Extended Data Fig. 5) reduces the reading time by 70% and achieves correct readout compared with SA1 (non-compatible design; Extended Data Fig. 4).
Simulation verification was performed, covering the programming, erasing and reading operation modes. Figure 3e shows the timing diagram for internal command and data transmission during the programming operation. The programming instruction includes two 8-bit command bytes, an address byte and 4 data bytes. WL[22] is addressed, and voltages are applied to 32 bits on WL[22] concurrently, achieving parallel programming. Extended Data Fig. 6 provides the timing diagrams of erasing and reading operations. These verification results confirm that the 2D flash can support instruction-driven operations, up to 32-bit parallelism and random-access ability.
Function demonstration based on full-chip test
Figure 4a shows the functional testing of the fabricated 2D NOR flash chip using a dedicated chip test system. The host computer provides a software interface and loads the test program onto the field-programmable gate array (FPGA), which then transmits the instructions to the 2D flash chip. The arbitrary waveform generator (AWG) and d.c. power supply provide the necessary clock and d.c. signals, respectively. Figure 4b shows the data flow of the 2D flash chip. When the power supply is activated, the POR circuit gives the reset bar (rstb) signal and enables the chip for normal operation. External instructions are conveyed to the logic module by the I/O module, generating three types of signal: control logic signals, address signal and data signal. Following these instructions, the power switch module adjusts the required voltage domain to each buffer, depending on the specific operation modes. The voltage pulses are then applied to the corresponding ports of the memory array through the WL/BL/SL buffers, completing the desired operation.
a, Schematic of the chip test system. The AWG and d.c. power supply provide the required external clock signals (OSC) and d.c. signals, respectively. The FPGA transmits the command and data between the host computer and the I/O ports of the 2D flash chip, including the CS, SCLK, serial data input (SI) and serial data output (SO). The oscilloscope monitors pulse waveforms generated by the AWG. b, Data flow of the 2D flash chip. Modules are labelled in rectangular boxes, whereas the flow of key signals is indicated by arrows. c, Histogram of the programming accuracy across the 32 WLs after checkerboard programming. About 93.55% of cells reach the target states corresponding to the checkerboard pattern. dout, data output; rd_clk, read clock; rdbl, read bit line.
Full-chip programming and erasing tests were performed under a 5 MHz clock with a 500-ns operation pulse (one pulse lasts for 2.5 clock cycles) to ensure reliable operation, as discussed in Supplementary Information section 8. The results are summarized in Supplementary Table 5, showing an overall yield of 94.34%. A failure analysis (Supplementary Information section 9) showed that operational failures were primarily caused by process issues, which led to channel cracks and Vth variations. Our yield marks a marked advance in the integration of 2D electronics above the 1-Kb scale of the on-chip process11,40,41. Moreover, the International Technology Roadmap for Semiconductors requires a yield of approximately 89.5% in flash manufacturing42, so further optimization of our chip is expected to lead to practical applications.
As a more complex chip-level function demonstration, the test of programming a checkerboard pattern (a pattern of alternating state-0 and state-1) was performed. Supplementary Table 6 provides the datasheet of the memory states before and after the checkerboard programming. Figure 4c shows the programming accuracy of each row. Approximately 93.55% of the cells achieved the correct states corresponding to the checkerboard pattern. Only three cells were unintentionally programmed, confirming effective crosstalk suppression design. Supplementary Log Data provides the original log file generated during the chip testing process. Supplementary Video shows the process of chip checkerboard programming validation by the host computer.
The full-chip test yielded the following peripheral circuitry average supply current at the maximum parallelism: programming, 1.04 mA; erasing, 1.25 mA; and reading, 1.14 mA, corresponding to the power consumption of 5.2 mW, 6.25 mW and 5.7 mW, respectively. These are close to commercial standalone NOR flashes with similar technology nodes43,44,45. Moreover, advanced embedded NOR flash with systematic energy consumption optimization effectively reduces energy consumption from peripheral circuits, making cell programming energy the dominant factor46. The 2D flash with a low programming energy consumption of 0.644 pJ per bit has great potential in advanced embedded applications. Supplementary Information section 10 provides a comprehensive comparison between 2D flash and Si flash. Supplementary Information section 11 discusses the scalability of the 2D flash chip. Notably, as the current NAND and NOR architectures are designed for silicon flash cells, further expansion of the speed and energy consumption advantages of 2D flash from the device to the system level requires an innovation in memory architecture that is tailored to the mechanisms of 2D devices.
Conclusion
We have demonstrated a full-featured 2D NOR flash chip using the ATOM2CHIP technology. The full-stack on-chip process ensures a high yield of 94.34% by addressing random stress resulting from random roughness of the CMOS circuitry and damage from conventional chip packaging. The fabricated 2D flash cells support 20-ns fast operation and 0.644-pJ per bit low energy consumption. The proposed cross-platform system design provides a methodology to ensure compatibility between 2D electronics using emerging mechanisms and the mature CMOS platform. The 2D NOR flash chip is demonstrated to be capable of instruction-driven operation, 32-bit parallelism and random access using a 5-MHz clock. This work provides a promising technical pathway to bring promising 2D electronics concepts to real-world applications.
Methods
Flash chip fabrication
The CMOS circuitry was fabricated in a standard CMOS foundry using a 0.13-μm process. The received 8-inch wafer had a passivation layer thickness of approximately 800 nm, with pre-reserved vias at the port pads of I/O (TGV1 region) and WL/BL/SL buffers (TGV2 region). The wafer was cut into individual dies, each with a dimension of 5 mm × 5 mm (four sets of identical circuits included). Polymer-mediated delamination treatments were performed on the CMOS substrate before integrating 2D flash. The CMOS substrate was cleaned by soaking in acetone for 12 h, followed by spin-coating with photoresist (S1818) and removal of the photoresist using N-methyl-2-pyrrolidone (NMP) soak for 12 h.
Direct-write lithography was used to expose windows at the TGV2 region, and e-beam evaporation (EBE) was used to fill the vias with 5/500 nm Cr/Au. WLs were defined using direct-write lithography, followed by the deposition of 5/100/5 nm Cr/Au/Pt. The O2 plasma treatment (50 W, 20 s) was used to further clean and activate the surface for dielectric deposition. A 13-nm HfO2 blocking layer was deposited using thermal atomic layer deposition. Tetrakis(ethylmethylamino)hafnium reacts with water at 150 °C to form HfO2. The floating gate pattern was defined by direct-write lithography, and 3-nm Pt was deposited by EBE. The O2 plasma treatment was performed again. Subsequently, a 7-nm HfO2 tunnelling layer was deposited using the same atomic layer deposition system. Vias through the HfO2/Pt/HfO2 memory stack were defined by direct-write lithography and etched using reactive ion etching (Ar + CHF3, 175 W, 255 s), and EBE was then used to deposit a 5/50 nm Cr/Au layer to fill the vias. Chemical vapour deposition monolayer MoS2 (purchased from Sixcarbon Technology) was transferred onto the memory stack using a gradual-release transfer process. The minimum approach speed between MoS2 and the substrate is carefully controlled to be as low as 500 nm per step using the custom-made transfer equipment. Polystyrene was used as the supporting layer because of its large Young’s modulus to avoid wrinkling. The polystyrene supporting layer was removed by soaking in toluene for 12 h. The MoS2 channels were patterned by direct-write lithography and etched by O2 plasma (30 W, 20 s). The sample was soaked in NMP for 12 h to remove the photoresist. To fully release stress and air gaps in MoS2, multiple annealing processes in an N2 atmosphere (200 °C, 3 h) were performed for both large-area films and patterned strips. The adhesion between MoS2 and the substrate can also be enhanced during these processes. BLs and SLs were defined by direct-write lithography, followed by the deposition of 5/100 nm Cr/Au using EBE. For the fabrication of the 2D flash on a SiO2/Si substrate, the process involving the vias mentioned above is not required.
To passivate the 2D flash module, a layer of S1818 photoresist was spin-coated onto the sample. The TGV1 region of the I/O module was exposed by direct-write lithography for wire bonding. The chip was packaged using a ceramic dual-in-line package (DIP 24).
Inverter chain design of the buffer module
According to the logical effort theory, the total logical effort, determined by the ratio of the load capacitance (10 pF in our case, considering design margin) to the inherent input capacitance of the first-stage CMOS inverter (2 fF, decided by selected CMOS technology), should be distributed across a chosen number of inverter stages for an optimized propagation delay time. The propagation delay time of the inverter chain in the buffer can be calculated by
where N is the number of stages of the inverter chain, Cg,j is the gate capacitance for the jth inverter, Cg,N+1 is defined as the capacitance load, here parasitic capacitance of the 2D memory array, tp0 is the intrinsic delay for the inverter and γ is a parameter dependent on the process, usually near 1.
For an optimized design, the gate capacitance (and the inverter size) should be the geometric mean of the adjacent inverters, such that
and the optimized propagation delay time can be written as
Usually, Cg,1 is the minimum inverter gate capacitance for a certain process—in our work, 2 fF—and Cg,N+1 is 10 pF. Therefore, the optimized N for the inverter chain is 6 with a propagation delay of about 27.3tp0, whereas N = 4 is sufficient with a delay of around 30.7tp0 and offers benefits related to buffer size. For an inverter of each stage, the driver ratio is \(\sqrt[N]{\frac{{C}_{{\rm{g}},N+1}}{{C}_{{\rm{g}},1}}}\approx 8\), and the optimized driver chain is designed as shown in Fig. 3c.
Material characterization
The TEM-ready samples were prepared using the in situ FIB lift-out technique on an FEI Strata G4 HX dual-beam FIB scanning electron microscope. The samples were capped with sputtered electron-beam Pt and ion-beam Pt before milling. STEM and TEM images were captured with the Thermo Scientific Tecnai Z aberration-corrected transmission electron microscope at an accelerating voltage of 200 kV. Energy-dispersive spectra were obtained in STEM mode using a Super X FEI system. The AFM images of the devices were measured by an MFP-3D Origin+ (Asylum Research, Oxford Instruments) system. Optical images were captured by an optical microscope (OLYMPUS BX53M) and an extended-DOF microscope (KEYENCE VHX-6000).
Electrical measurements
The electrical characterization of the standalone 2D flash devices and the 4 × 32 array was carried out at room temperature and under atmospheric conditions (except the retention test) in a probe station (Cascade Summit 11000 type). The retention test was conducted in a customized vacuum probe station. The voltage pulses were generated using a semiconductor parameter analyser (B1500, Keysight). The waveform was captured using an oscilloscope (DPO 5204, Tektronix).
The electrical characterization of the 2D flash chip was performed with a dedicated chip test system. The arbitrary waveform generator (33120 A, Agilent) provides clock signals, monitored by an oscilloscope (DSOX1204A, Keysight). The d.c. power supply (E36312A, Keysight) provides d.c. signals required for testing the chip, including −1 V, −5 V, 2 V, 3 V, 5 V and 9 V. The host computer provides a software interface and loads the test program onto the FPGA. FPGA transmits the command from the host computer to the I/O ports of the 2D flash chip. The packaged 2D flash chip was placed into a test socket compatible with the DIP package before testing.
Data availability
The data that support the plots in this paper and other findings of this study are available from the corresponding authors upon request. Source data are provided with this paper.
References
Desai, S. B. et al. MoS2 transistors with 1-nanometer gate lengths. Science 354, 99–102 (2016).
Wu, F. et al. Vertical MoS2 transistors with sub-1-nm gate lengths. Nature 603, 259–264 (2022).
Jiang, J., Xu, L., Qiu, C. & Peng, L.-M. Ballistic two-dimensional InSe transistors. Nature 616, 470–475 (2023).
Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature 556, 43–50 (2018).
Liu, C. et al. Small footprint transistor architecture for photoswitching logic and in situ memory. Nat. Nanotechnol. 14, 662–667 (2019).
Yasuda, K. et al. Ultrafast high-endurance memory based on sliding ferroelectrics. Science 385, 53–56 (2024).
Verreck, D. et al. The promise of 2-D materials for scaled digital and analog applications. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC), 26–28 (IEEE, 2023).
Dorow, C. J. et al. Exploring manufacturability of novel 2D channel materials: 300 mm wafer-scale 2D NMOS & PMOS using MoS2, WS2, & WSe2. In Proc. 2023 International Electron Devices Meeting (IEDM), 1–4 (IEEE, 2023).
IRDS. International roadmap for devices and systems: 2023 update. IEEE https://irds.ieee.org/images/files/pdf/2023/2023IRDS_BC.pdf (2023).
Publishing 2D materials research towards industrialisation. Nat. Commun. 13, 1918 (2022).
Jayachandran, D. et al. Three-dimensional integration of two-dimensional field-effect transistors. Nature 625, 276–281 (2024).
Lu, D. et al. Monolithic three-dimensional tier-by-tier integration via van der Waals lamination. Nature 630, 340–345 (2024).
Wachter, S., Polyushkin, D. K., Bethge, O. & Mueller, T. A microprocessor based on a two-dimensional semiconductor. Nat. Commun. 8, 14948 (2017).
Zhu, K. et al. Hybrid 2D–CMOS microchips for memristive applications. Nature 618, 57–62 (2023).
Castro Neto, A. H., Guinea, F., Peres, N. M. R., Novoselov, K. S. & Geim, A. K. The electronic properties of graphene. Rev. Mod. Phys. 81, 109–162 (2009).
Manzeli, S., Ovchinnikov, D., Pasquier, D., Yazyev, O. V. & Kis, A. 2D transition metal dichalcogenides. Nat. Rev. Mater. 2, 17033 (2017).
Liu, C. et al. Two-dimensional materials for next-generation computing technologies. Nat. Nanotechnol. 15, 545–557 (2020).
Novoselov, K. S., Mishchenko, A., Carvalho, A. & Castro Neto, A. 2D materials and van der Waals heterostructures. Science 353, aac9439 (2016).
Geim, A. K. & Grigorieva, I. V. Van der Waals heterostructures. Nature 499, 419–425 (2013).
Zeng, Q. et al. Band engineering for novel two-dimensional atomic layers. Small 11, 1868–1884 (2015).
Liu, L. et al. Ultrafast non-volatile flash memory based on van der Waals heterostructures. Nat. Nanotechnol. 16, 874–881 (2021).
Wu, L. et al. Atomically sharp interface enabled ultrahigh-speed non-volatile memory devices. Nat. Nanotechnol. 16, 882–887 (2021).
Huang, X. et al. An ultrafast bipolar flash memory for self-activated in-memory computing. Nat. Nanotechnol. 18, 486–492 (2023).
Yu, J. et al. Simultaneously ultrafast and robust two-dimensional flash memory devices based on phase-engineered edge contacts. Nat. Commun. 14, 5662 (2023).
Jiang, Y. et al. A scalable integration process for ultrafast two-dimensional flash memory. Nat. Electron. 7, 868–875 (2024).
Chung, Y.-Y. et al. First demonstration of GAA monolayer-MoS2 nanosheet nFET with 410μA μ m ID 1V VD at 40nm gate length. In Proc. 2022 International Electron Devices Meeting (IEDM), 34.5.1–34.5.4 (IEEE, 2022).
Smets, Q. et al. Scaling of double-gated WS2 FETs to sub-5nm physical gate length fabricated in a 300mm FAB. In Proc. 2021 IEEE International Electron Devices Meeting (IEDM), 34.2.1–34.2.4 (IEEE, 2021).
Thomas, S. An industry view on two-dimensional materials in electronics. Nat. Electron. 4, 856–857 (2021).
Milana, S. The lab-to-fab journey of 2D materials. Nat. Nanotechnol. 14, 919–921 (2019).
Goossens, S. et al. Broadband image sensor array based on graphene-CMOS integration. Nat. Photon. 11, 366–371 (2017).
Kim, K.-H. et al. Scalable CMOS back-end-of-line-compatible AlScN/two-dimensional channel ferroelectric field-effect transistors. Nat. Nanotechnol. 18, 1044–1050 (2023).
Wang, J. et al. Locally strained 2D materials: preparation, properties, and applications. Adv. Mater. 36, 2314145 (2024).
Chae, W. H., Cain, J. D., Hanson, E. D., Murthy, A. A. & Dravid, V. P. Substrate-induced strain and charge doping in CVD-grown monolayer MoS2. Appl. Phys. Lett. 111, 143106 (2017).
Jayachandran, D., Sakib, N. U. & Das, S. 3D integration of 2D electronics. Nat. Rev. Electr. Eng. 1, 300–316 (2024).
Lanza, M., Smets, Q., Huyghebaert, C. & Li, L.-J. Yield, variability, reliability, and stability of two-dimensional materials based solid-state electronic devices. Nat. Commun. 11, 5689 (2020).
Raja, A. et al. Dielectric disorder in two-dimensional materials. Nat. Nanotechnol. 14, 832–837 (2019).
Akinwande, D. et al. Graphene and two-dimensional materials for silicon technology. Nature 573, 507–518 (2019).
Shen, Y. et al. The trend of 2D transistors toward integrated circuits: scaling down and new mechanisms. Adv. Mater. 34, 2201916 (2022).
Liu, A. et al. The roadmap of 2D materials and devices toward chips. Nano-Micro Lett. 16, 119 (2024).
Migliato Marega, G. et al. A large-scale integrated vector–matrix multiplication processor based on monolayer molybdenum disulfide memories. Nat. Electron. 6, 991–998 (2023).
Migliato Marega, G. et al. Logic-in-memory based on an atomically thin semiconductor. Nature 587, 72–77 (2020).
IRTS. International technology roadmap for semiconductors 2.0. Semiconductor Industry Association https://www.semiconductors.org/wp-content/uploads/2018/09/Yield.pdf (2015).
Onsemi Intelligent Technology. Serial flash memory 2 Mb (256K × 8). Semiconductor Components Industries https://www.onsemi.com/products/timing-logic-memory/memory/flash-memory#products=fnN0YXR1c352YWx1ZX4yfiF+TGFzdCBTaGlwbWVudHN+IX5PYnNvbGV0ZX4= (2017).
Giantec Semiconductor. 4M/2M/1M/512K Industrial NOR flash. Giantec Semiconductor https://en.giantec-semi.com/NOR-Flash/Industrial-NOR-Flash (2022).
Cypress Semiconductor. S29JL032J, 32-Mb (4M × 8-Bit/2M × 16-Bit), 3 V, simultaneous read/write flash. Cypress Semiconductor https://www.infineon.com/row/public/documents/10/49/infineon-s29jl032j-32-mb-4m-x-8-bit-2m-x-16-bit-3-v-simultaneous-read-write-flash-datasheet-en.pdf (2019).
Dong, Q. et al. 11.2 A 1Mb embedded NOR flash memory with 39µW program power for mm-scale high-temperature sensor nodes. In Proc. 2017 IEEE International Solid-State Circuits Conference (ISSCC), 198–199 (IEEE, 2017).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2024YFA1208400 and 2021YFA1200500), the National Natural Science Foundation of China (62322405, 62374042 and 62525401), the Shanghai Pilot Program for Basic Research, FuDan University 21TQ1400100 (21TQ011), the New Cornerstone Science Foundation through the Xplorer Prize and the young scientist project of the MOE innovation platform.
Author information
Authors and Affiliations
Contributions
C.L. and P.Z. conceived the idea; C.L., Y.J., B.S., S.Y. and Z.C. designed and conducted the experiment; Z.B., Y.X. and C.W. helped with their discussions and provided TCAD simulation support; T.W., H.W., Z.L., Y.W. and S.W. provided experimental assistance. C.L., Y.J. and B.S. co-wrote the manuscript; P.Z. provided inputs for the discussion and revision of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Kai-Shin Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 The performance of 2D flash cells fabricated through full-stack on-chip process.
a, b, Lossless and high uniformity of 2D flash cells. Dual sweep transfer characteristic curves of 2D flash cells fabricated on highly flat SiO2 substrate (60 cells) and on CMOS substrate (60 cells) through full-stack on-chip process (a). The ON/OFF current extracted from (a) at VGS = 0 V (b). c, d, Typical transfer characteristic curves among 1,008 cells (c) and output characteristic curves among 1,012 cells (d) of state-ON and state-OFF. The 2D flash cells are programmed by progressive amplitude pulses, with a fixed pulse width of 20 ns. e, f, TCAD simulation for programming energy consumption evaluation. The device structure for simulation (e). The channel length (2 μm) and width (2 μm) of the simulated MoS2 flash memory were the same as the fabricated 2D flash cell. The tunnelling current is simulated to be 2.3 μA under designed operation voltage and tunnelling layer thickness (f). Therefore, the programming energy consumption is evaluated to be 20 ns × 2.3 μA × 14 V = 0.644 pJ/bit. More details about the TCAD simulation method and energy consumption evaluation are discussed in Supplementary Information Section 5.
Extended Data Fig. 2 Retention performance for 2D flash cell with HfO2 tunnelling layer.
Vth shift for state-ON (a) and state-OFF (b) at 55 °C, 85 °C and 125 °C are extracted by the Vth difference between as-programmed cell and cell after baking to present the retention loss. c, Total memory window with different bake temperature and time. The memory window considers both the retention loss from state-ON and state-OFF in (a) and (b). When MW lost by half, the cell is determined to fail. d, the temperature of retention lifetime extrapolated from (c). A 10 years lifetime at 54.8 °C is extracted with Arrhenius model.
Extended Data Fig. 3 The crosstalk suppression 2D NOR flash circuit design.
a, Circuit diagram of the 2D NOR flash (2 × 2 array depicted for clarity). The upper left cell is the selected cell to be programmed (red block), and the two cells adjacent to it are the half-selected cells where crosstalk may occur (blue block). The half-selected cells only exposed to 1/2 VPP. b, Simplified band diagram of selected cell and half-selected cells. The tunnelling efficiency of the half-selected cell is considerably reduced due to the lower electric field, since the tunnelling efficiency exhibits an exponential dependence on the applied voltage. c, d, The crosstalk evaluation by applying a crosstalk pulse to 100 2D flash cells (c) and consecutively applying multiple crosstalk pulses to one 2D flash cell (d). Pulse of 7 V and −7 V for programming crosstalk and erasing crosstalk, respectively. The pulse width is 100 ns.
Extended Data Fig. 4 Circuit schematic of SA module without 2D-compatible design (SA1) (a) and corresponding reading simulation results (b).
In the reading operation, MP9 is first opened to pre-charge VCOMP to 5 V (VDD). The reference current IREF (typically 10%–30% of the extracted cell ON-state current) is generated using AMP1 and the external resistor, transferred to VCOMP node through the current mirror and compared with BL current IBL to generate the output signal VOUT. During pre-charge, VBL is coupled to over 3.11 V, which leads to AMP2 at the wrong working condition. Due to the relatively small 2D current (~100 nA) and large BL parasitic capacitance (CBL), it takes a long time for VBL to discharge to below 3 V through BL for AMP2 to work, resulting in read error within the target read timing.
Extended Data Fig. 5 Circuit schematic of SA module with 2D compatible design (SA2) (a) and corresponding reading simulation results (b).
A switch transistor (MN7) is introduced to isolate the high capacitance load from the 2D memory array. Therefore, the capacitance couple effect is weakened, and VBL is only coupled to 3.038 V, which considerably reduces the time required for VBL discharging. The fast and accurate cell state readout is achieved. Besides, the CMOS inverter (corresponding to INV1 in Extended Data Fig. 4) is substituted with a pseudo-PMOS inverter, which can reduce the SA load capacitance, for better operation mode compatibility and lower propagation delay time.
Extended Data Fig. 6 Timing diagram of erasing and reading operations.
a, Full-chip erasing operation. The host computer sends the commands 06H and C7H. The all_en (to select all WLs) signal is activated, selecting the entire chip. The logic module generates an erase enable signal for the analogue circuit, while simultaneously producing a synchronised clock signal to apply the corresponding voltages to all cells. b, 32-bit parallel reading operation. The host computer sends the command 03H. In this timing diagram, address “00H” (corresponding to WL[0]) is selected for reading. The read enable signal is activated, and all the 32 BLs are applied with a reading bias. When the rd_clk (read clock) signal is activated, the sense amplifier amplifies the readout data, then sends the 32-bit data to the logic module, where the data is processed from parallel to serial. When the o_so_en (data output enable) signal is activated, the logic module transmits the data o_padout (data output) to the I/O module for data readout.
Supplementary information
Supplementary Information
This file contains Supplementary Sections 1–11; Supplementary Figs. 1–32; Supplementary Tables 1–8 and Supplementary References.
Supplementary Log Data
This file contains the original log file generated during the chip testing process.
Supplementary Video 1
This file contains the process of programming a checkerboard pattern.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, C., Jiang, Y., Shen, B. et al. A full-featured 2D flash chip enabled by system integration. Nature (2025). https://doi.org/10.1038/s41586-025-09621-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-025-09621-8