Introduction

There exists a growing global consensus on the need to orchestrate energy transitions to avert the worst effects of climate change. As a result, significant efforts are being made around the world to transform our energy systems, and, as part of this process, engage with communities to increase public awareness about various clean energy options1. These efforts have resulted in increased understanding and acceptance of solar and wind energy technologies; leading to an increased energy literacy. Energy literacy is an understanding of the nature and role of energy in the world and daily lives accompanied by the ability to apply this understanding to answer questions and solve problems. This may include showing an awareness of low-carbon energy sources among the public and increased participation in decision-making about the future of energy systems2. While energy literacy fosters an individual and collective understanding of energy sources and systems, capturing society’s broader visions for how these energy systems can drive desirable social and political outcomes is an important aspect; a term known as Sociotechnical imaginaries (STI).

STIs are collectively held visions of desirable futures that are shaped by the interaction of society and technology. These imaginaries represent how societies imagine their future possibilities, particularly in relation to scientific and technological advancements, and how they envision the role of technology in shaping social and political life. Addressing STI is crucial in the context of AI advancements to ensure that generative AI is viewed not as an unavoidable force to which society must conform, but as a tool society can leverage to shape its future across various domains. This requires recognizing the profound interplay between generative AI and the societal values, norms, and ethics that STIs embody.

A study regarding Taiwanese energy performed an analysis of fragments of legal documents, energy strategies, and newspaper articles coded as a particular STI3. It was found that Green Technology & Modern Future represents 72.2% of the coded fragments, while the other two STIs, Nuclear Stability (17.2%) and Community Energy (10.5%), are significantly less prominent in the sample. A similar STI study in China showed that the Chinese are shifting towards renewable energy systems, storage and electric vehicles from conventional fossil fuels and gas vehicles4. Another article incorporated STI methods by taking 135 relevant energy abstracts and analyzed keywords to show a direction of energy trends5.

In the context of STI, the emergence of state-of-the-art Text-to-Image Generative Artificial Intelligence (AI) Models, such as DALL-E, is another layer of potential interaction between publics and society writ large and technology developers and energy system planners. This paper explores the performance of generative AI models in giving shape to sociotechnical imaginaries of our future energy systems by creating realistic scientific images to motivate public engagement and increase public awareness of lesser-known, or misunderstood clean energy solutions. In this paper, Generative AI Models are defined to be “models that create images from different types of input data including but not limited to text, scene, graph and object layout”6.

In 2021, OpenAI’s Text-to-Image Generative AI Model DALL-E was made public. Since then, there has been a growing interest in exploring the potential of these models for creative and technical applications. However, the demand for generative AI models emerged before 20217,8,9. Generative AI models were developed for a variety of reasons, such as for data augmentation and creation of new content (e.g., images, text). Previous studies in this field have utilized generative AI models for educational purposes, such as clothing and accessory image generation for craft education, crop image generation in agriculture settings, in architecture and urban design, as well as novel art generation7,8. In the fashion industry, generative AI models are used to improve designer efficiency; for example, Yan et al.10 created a training data set of 115,584 pairs of fashionable items, which was used to test generative text-to-image AI performance.

In 2008, McCrum et al.9 used a generative AI model to create realistic images to simulate Martian exploration robots in the software Planet and Asteroid Natural Scene Generation Utility (PANGU). More recently, there has been a dynamic movement towards utilizing generative AI models to create images to improve the quantity and diversity of training data in predictive medical diagnostic programs11. To tackle the challenges arising in acquiring massive data due to patient privacy concerns, Akrout et al.11 employed such generative AI models for data augmentation. The commonality among the presented studies9,11 above is that, when data is relatively scarce, generative AI models can augment data to improve the performance of machine learning classification algorithms by reprocessing existing data. Generative AI models have been applied to areas beyond data augmentation. These areas include generation of novel artworks and the production of visual materials for communication12,13. Recently, highly sophisticated and imaginative generative AI models have even been used in the field of architectural design. In Paananen et al.12, students were tasked to design a culture center in a small island using generative AI models, namely DALL-E, StableDiffusion, and Midjourney. Figure 1 in Ref.12 illustrates one of the standout works identified as the best designs12.

While the previously mentioned applications have employed generative AI models in a positive context, it is important to recognize that there are also negative implications and ethical concerns of AI image generators. A concern with generative AI images is copyright violations. Images are extracted from search engines, such as Google, to train a generative AI model. Since many of these images are protected by copyright, the resulting image produced by the generative AI models may breach copyright law as these images are trained without direct consent of the creators14.

Generative AI models could also intentionally be used to generate images that portray a false representation of reality or contain disinformation. Such works like deepfakes could be used to damage reputations, blackmailing individuals’ for monetary benefits, inciting political or religious unrest by targeting politicians or religious scholars with fake videos/speeches dissemination, as well as spread disinformation about current events15. Additionally, images produced by generative AI could additionally reflect and perpetuate stereotypical, racist, discriminatory, and sexist ideologies. For example, Buolamwini and Gebru16 reported that two facial generative AI training data sets, IJB-A and Adience, are composed of 79.6% and 86.2% lighter-skinned subjects, respectively, rather than darker-skinned subjects. It was also found that darker-skinned females are the most likely to be incorrectly classified, with a classification error rate of 34.7%16. As generative AI models are trained on a wide range of images from the internet, female and female-identifying individuals face both systemic underrepresentation and stereotypical overrepresentation. For instance, only 38.4% of the facial images in a dataset of 15,300 generated by DALL-E 2 depicted women, compared to 61.6% depicting men17. These models also tend to reinforce gender roles in career depictions-occupations such as personal care supervisor, housekeeping cleaner, and teaching assistant were 7% to 25% more likely to show women, whereas male figures were 23% to 27% more prevalent in roles like mechanical engineer, aerospace engineer, and computer programmer17. These biases may arise during the initial training phase, where a higher proportion of images of male computer programmers, for example, leads to biased outputs. These findings suggest the ability of generative tools to serve as visioning and futuring aids can be limited by the biases and path-dependencies baked into them during their development and training.

While there have been many studies highlighting the use of generative AI models in areas such as medical diagnostics, robotic motion planning, fashion, art, architecture and urban design, we discovered there are a lack of studies that utilize generative AI models to address clean energy and climate change-related problems from a technical and policy perspective. Of the sources evaluated in our literature review, only one study18 performed a community-centered study of cultural limitations of generative AI models. However, this study was specifically performed in the regime of the South Asian context to study the impact of global and regional power inequities. Another study19 emphasized that there exists a need to incorporate visual images alongside written language to impact public perceptions of climate change. However, generative AI was not explicitly employed in the study. Due to this lack of prior investigation, our research team was interested in determining whether generative AI models can produce technically accurate images that reflect the given prompt even in specialized and technically sophisticated engineering-oriented image scenarios. Furthermore, previous studies primarily relied upon widely recognized tools, such as DALL-E, Stable Diffusion, and Midjourney. For example, Sapkota et al.7 used MidJourney and Vartiainen et al.8 used DALL-E 2. Recently, a plethora of models have been introduced beyond the three aforementioned models. Our research team has taken the initiative to directly engage with these models, evaluating their pros and cons in the process. In this paper, our team conducted a case study on generative AI models to test performance and accuracy related to nuclear energy prompts. We analyzed 20 different generative AI models, with an emphasis on the tools with an accessible Python API. We then selected the top 3 performing models among 20 models based on accessibility, image quality, accurate portrayal of prompts, process time, and cost. Our study specifically tested these models for visualizing nuclear energy—a technology that has long been polarizing in the public consciousness and equally engendering fervent support and mistrust. For this process, we selected prompts related to different nuclear landscapes—power plant components and processes occurring within a power plant, sites where nuclear plants might be located, and the nuclear workforce ran these prompts through the top 3 generative AI tools, applied prompt engineering to enhance the generator’s ability to create images that reflect the given prompt as well as increase the technical accuracy of the images. Finally, we analyzed the performance of these tools in regard to their technical accuracy in depicting different nuclear engineering components such as radiation shielding of a nuclear reactor, the primary side of a pressurized water reactor, etc.

The work presented in this paper is novel for several reasons. First, the majority of these generative AI tools began their maturation process a few years ago, with a restricted amount of literature and analysis available regarding their technical accuracy in a scientific context. Second, our literature survey indicates a minimal application of generative AI for generating images to foster public engagement and invite community perspectives on the intended outcomes of ongoing clean energy transitions. Third, this study assesses the robustness of current state-of-the-art generative AI models and assesses the necessity of specialized generative AI tools within specific disciplines where models are trained on discipline-focused images and text captions. Such applications to nuclear energy include nuclear fuel rod fabrication, proper waste management images, and nuclear reactor designs.

Rest of this paper is organized as follows: “Generative AI” section presents the concepts behind how generative AI models work and their primary features. In “Methodology” section, we compare all testing generative AI models according to several factors by discussing their advantages and disadvantages. “Results and discussions” section presents the generative AI results for the top 3 generative AI models based on similar prompts. The conclusions of this work and the potential opportunities for future work are highlighted in “Conclusions” section.

Generative AI

Generative AI concepts

Generative text-to-image AI models are a subset of generative AI models that take text input and create an image based on the input description. Fig. 1 illustrates images generated by various models using text as a basis. Fig. 1a,b were generated using DALL-E, while Fig. 1c was produced through Midjourney. Generative AI models can create logical as well as unusual images that would be difficult to find elsewhere, such as a turkey inside a nuclear cooling tower in Fig. 1.

Fig. 1
figure 1

(a) A turkey cooked on a nuclear power steam (DALL-E 2). (b) A happy man working at a nuclear power plant without any risks (DALL-E 2), and (c) nuclear power plant (Midjourney). Credit: The images in this figure are generated using DALL-E 2—https://openai.com/policies/row-terms-of-use/ and Midjourney—https://docs.midjourney.com/docs/terms-of-service.

As evident from Fig. 1, generative AI models process the captions provided by the user and reproduce corresponding images. Interestingly, both DALL-E and Midjourney generated images of cooling towers in response to the text “nuclear power plant”. This suggests that these models have been pre-trained to associate the text “nuclear power plant” with the concept of a cooling tower, likely because cooling towers are often the most visually prominent aspect of images of nuclear reactors. When generative AI models produce images of cooling towers, they are capturing an important feature of a nuclear plant but failing to depict other important features such as the reactor system itself. This is one of the gaps that we found in this work. The training process of text-to-image generative AI models is briefly described next:

  1. 1.

    Training concept: Generative AI models use a pre-trained data set of images that link natural language to an image. A popular pre-trained deep learning model is Contrastive Language Image Pretraining (CLIP), developed by OpenAI; CLIP was trained on 400 million images with text20. CLIP learns the weight of how much a caption relates to a given image. CLIP follows this workflow: Images and text captions are passed through encoders, which map all objects to a m-dimensional space21. The cosine similarity is taken from the text caption and image. Ideally, we should maximize the cosine similarity between N correct encoded image and text caption pairs, while minimizing the cosine similarity between \(N^2 - N\) incorrect encoded image and text caption pairs21. By comparison, other examples of models that use contrastive learning are ALIGN and CLOOB.

  2. 2.

    Decoding and transformer models: CLIP learns both an image and text encoding; Radford et al.21 used autoregressive models and diffusion models to perform the mapping of text caption encoding to image encoding for each image. The researchers found that both models produced similar results and diffusion models were computationally less intensive21. Next, the diffusion model (also known as diffusion prior) is utilized to map the text caption encoding to image encoding, displayed in Fig. 2. After CLIP receives the encoded image and caption data, the generative AI tool (e.g., DALL-E) needs to reverse this to generate an image; it uses a diffusion model (also known as diffusion posterior) to decode the CLIP encoded data. Inspired by the principles of thermodynamics, diffusion models are text-to-image models that add Gaussian noise to data and then reverse the diffusion process to restore clarity to the Gaussian blurred images22. DALL-E uses GLIDE, a transformer model by Open-AI, to decode image and caption data from CLIP. For CLIP, it encodes the caption string into tokens, takes tokens and inputs that to transformer, generates output tokens, conditions the final token for embedding, and then combines projection of final token embedding to additional context23.

Fig. 2
figure 2

Diffusion model flow. Credit: The image is taken from Ref.22.

Fig. 3 shows the format of how text-to-image generative AI works. A user inputs a text prompt, CLIP’s encoder maps this into m-dimensional space, the diffusion prior maps the CLIP text encoding to the corresponding CLIP image encoding, then the GLIDE model will use reverse-diffusion to map the CLIP text and image encoding to generate images based on the inputted text description23.

Fig. 3
figure 3

A schematic of how text-to-image generative AI tools work. Credit:The image on right of the figure is generated by DALL-E 2 (https://openai.com/policies/row-terms-of-use/) and the concept in the figure is based on Ref.24.

Generative AI features

The prompt is the caption that creates an image; generative AI tools rely on a prompt to take the user’s intention to generate an image. Some generative AI tools such as Canva’s text-to-image generation service have a graphical user interface (GUI) that allows a user to input prompts, and generate an image based on that prompt. Other generative AI tools have either free or paid API access, where a user can input a prompt into a Python script. When employing generative AI through a GUI, while the usage is intuitive, it may not be optimal for generating a large volume of images using extensive prompts. Consequently, the most ideal scenario arises when both GUI and API are concurrently available.

In order to fully take advantage of text-to-image generative AI models, we looked for models that supported a text input prompt, inpainting, outpainting, model training, and image-to-image editing. Each of these terms are described below.

Inpainting is a tool used to take missing or unknown parts of an image and use AI to generate this unknown region25. Generative AI models are trained on an extensive set of images; inpainting takes its trained data set to replace specific parts of an image. Inpainting is most commonly used for the removal of unwanted objects, image restoration, and image editing26. Fig. 4 shows an example of inpainting from StableDiffusion.

Fig. 4
figure 4

Inpainting from stable diffusion: rotating head and hand. Credit: This figure was taken from this source27, which is generated using Stable Diffusion—https://stability.ai/news/stable-diffusion-public-release.

Outpainting is the opposite of inpainting; outpainting is a tool used to extend the borders to add additional parts to the image using AI28. Outpainting can be used to change the aspect ratio of an image and extend borders to an image. Fig.5 shows an example of outpainting using DALL-E model.

Fig. 5
figure 5

Girl with a pearl earring (left) and DALL-E 2 outpainting of “Girl with a Pearl Earring” (right). Credit: This figure was taken from this source29 which is generated using DALL-E 2 software—https://openai.com/policies/row-terms-of-use/.

Image-to-image models take an image for an input and allow specific edits to be made to yield a fine-tuned output30. Commonly, image-to-image models will allow for style changes, altering resolution, and generation of high-quality images from low-quality images. Fig. 6 shows a high-quality creation of an apple from a basic sketch performed using image-to-image technology in StableDiffusion.

Fig. 6
figure 6

Image to image of apple sketch from stable diffusion. Credit: This figure was taken from this source31, which is generated using stable diffusion—https://stability.ai/news/stable-diffusion-public-release.

“Model training feature” refers to the training of a machine learning model, usually a neural network, with a sample of image-caption pairs. The generative AI will then use an input sample of the images for training data and output several images following a prompt32. For instance, to develop a specialized generative AI tailored to nuclear power, one can train the model using captions such as “nuclear power plant” accompanied by a variety of images from different nuclear power plants. Ideally speaking, through this training, the AI becomes capable of producing more realistic and accurate images when prompted with content related to “nuclear power”.

Methodology

Comparison of generative AI models

We tested 20 total text-to-image generative AI tools, each with varying results shown in Table 1 for the tools with promising performance and in Table 2 for the tools with poor performance. In our initial evaluation, we first identified which tools had API access. Tools that did not have API access were then removed such as Nightafe, Fotor AI and Artbreeder. Additionally, tools such as DreamStudio that used the same API as another model (Stable Diffusion) were also removed. Then we narrowed down our tools based on the ability to generate images. Parti and Google Brain Images were eliminated because they are not available to the public. DeepAI was similarly eliminated due to the availability of only paid subscription services. Other text-to-image generative AI tools such as StackGan++ and CLIP required training data in order to generate images, and thus were also eliminated. Midjourney is a popular text-to-image generative AI model, but we decided against using Midjourney since a Discord account is required to access the API, and due to increased usage, the servers were generally unavailable. We then reviewed the commercial rights of these programs and found that Leonardo.Ai did not support commercial use and was thus not considered for detailed analysis. Starryai, Picsart, Kapwing, Writesonic additionally had poor technical image quality when tested on basic prompts including “Display radioactive nuclear waste” and “China and Nuclear.” These prompts were selected due to their simplicity; models were removed if provided poor technical details following a basic nuclear prompt (i.e. faces getting morphed in a nuclear setting, not displaying cooling towers). A summary of the basic prompts, along with prompt results are provided in Table 5. Of the remaining systems, DALL-E 2 and Stable Diffusion both had paid subscriptions; however, they were chosen due to their capabilities of inpainting/outpainting, image-to-image editing, and good performance in image generation. In contrast, Canva and Craiyon both have free subscriptions but no inpainting/outpainting and image-to-image editing. However, Canva had a very long generation time for images, when compared to all of the other models and was thus removed.

Table 1 Successful generative AI models.
Table 2 Unsuccessful generative AI models.

Rationality of prompts

The prompts generated in this study were chosen to explore themes that have led to controversy and polarization of perspectives on nuclear energy. These themes include gender imbalance in workers in the nuclear industry, the negative effects of nuclear energy on indigenous communities, the impacts of a nuclear power plant on its possible surroundings (i.e. nature), long term storage of nuclear waste, and the technical understanding of nuclear reactor systems and components (i.e. reactor core, fuel, shielding, types of reactors). To be clear, we did not prompt the AI image generators to reproduce these known biases but instead, our goal was to explore whether the image generators reproduced these known biases when given a more general prompt. For example, when prompted to depict nuclear plant workers, would the image generator reproduce known gender imbalances in the nuclear sector?

First, a major set of our prompts aims to assess Generative AI ability in understanding nuclear reactor components (i.e., reactor core, fuel, shielding, and types of reactors). The intricate design and functionality of nuclear reactors depend on specific components like the reactor core, fuel, and shielding, all of which play critical roles in ensuring operational efficiency and safety. Prompts that ask AI to generate depictions or explanations of these components serve to explore whether generative models can accurately replicate the detailed engineering aspects of nuclear technology. The variety in reactor types (e.g., pressurized water reactors, boiling water reactors, and advanced designs) adds another layer of complexity that AI should handle.

Another focus of our prompts is on workers in the nuclear industry. Historically, the nuclear industry has been male-dominated, with significant gender disparities in the workforce. By generating prompts focused on gender, the study seeks to explore how generative AI visualizes or narrates this issue. This can provide insights into whether AI models perpetuate existing stereotypes or present a more inclusive view of women in technical roles.

Third, a set of our prompts focuses on possible surroundings for nuclear reactors (i.e., Nature). Nuclear reactors are often built near natural environments such as rivers, lakes, or forests. The interaction between nuclear installations and nature is a significant visual and thematic prompt, as it reflects the tension between technological advancement and environmental preservation. By generating AI representations of nuclear reactors in natural settings, the study aims to assess how AI conceptualizes this coexistence and whether it focuses on harmony or conflict between industrial structures and ecosystems.

Fourth, a set of our prompts highlights nuclear waste depiction in the media. Nuclear waste disposal is a contentious issue that raises concerns about long-term storage, safety, and environmental hazards. The way nuclear waste is depicted in the media (e.g., as dangerous barrels, underground storage, or radioactive symbols) heavily influences public perception of the risks associated with nuclear energy. These prompts explores how AI interprets and visualizes the concept of nuclear waste based on common media narratives, contributing to the understanding of whether AI reinforces alarmist imagery or provides a more nuanced view.

Fifth, some of the prompts focus on the negative effects of nuclear energy on indigenous communities. The potential harmful impacts of nuclear energy-such as radiation exposure, displacement due to reactor accidents (e.g., Chernobyl, Fukushima), and long-term health risks due to uranium mining are critical societal concerns. Prompts that explore such negative effects aim to generate AI responses that reflect or highlight the complex and often fraught legacy of nuclear technology.

For example, the Navajo people were subjected to health risks and environmental impacts from uranium mining in the twentieth century. A former Navajo uranium miner George Tutt was quoted as saying “We were blessed, we thought. Railroad jobs were available only far off like Denver ... but for mining, one can just walk to it in the canyon. We thought we were very fortunate, but we were not told, ‘Later on this will affect you in this way”33. The Navajo people believed they were fortunate enough to have mining opportunities close to their home, instead of traveling to, for example, Denver for railroad work. The Navajo people were unaware of what the term radioactivity meant and were unaware of the health risks associated with handling uranium33. Uranium mining stopped in the 1960s; however, effects are still being seen today on the New Mexican people. Nearby water and land have shown above background levels of uranium, as this study curated a map of high levels of uranium34. One study found that Navajo children born near uranium mines 1.83 times more likely to have 1 of 33 selected birth defects35.

Prompt engineering

Prompt engineering refers to optimizing the prompt (text input to models) for generating desired images from text-to-image generative AI models. Prompt Engineering can help in achieving the desired result from a pre-trained model, reducing the need of computational resources and knowledge to fine-tune these models for different tasks36. Apart from text-to-image models, this method has been applied to other generative models as well, like GPT-3 and ChatGPT, which are text-to-text generative AI models.

Prompt engineering is an iterative process and helps in efficient interaction with the latent space of generative models. Researchers have identified and classified different type of keywords to produce images closer to desired results37. Certain types of keywords, such as ’hyperrealistic’, ’oil on canvas’, ’abstract painting’,’in the style of a cartoon’, are especially useful in directing the style of the image, as displayed in Table 3. Therefore, such keywords have been used in this study as well to generate images closer to real-life.

One of the main characteristics identified by the authors for images generated related to nuclear energy was that images should look realistic in order to avoid exaggeration. This exaggeration can occur in the images due to the artistic nature of these models (see for example sample results in Table 8). Further, the images should be detailed to capture the intricacies of different components, especially in case of technical designs. To ensure that these characteristics are reproduced in the images generated by the generative models, we decided to include style modifiers keywords and quality booster keywords in the prompts37, ensuring realistic flair and high detailing of the images. Further, to improve the image quality, an additional description of the theme regarding the visual appearance of the subject was appended to the prompt. Fig. 7 displays the flowchart for prompt engineering we adopted in this study. The method is implemented in an iterative manner, changing keywords and descriptions associated with the prompt to get as realistic results as possible.

Table 3 Dictating image style of nuclear cooling tower with prompt engineering.
Fig. 7
figure 7

Prompt engineering algorithm.

Results and discussions

Of the 20 AI models explored, we narrowed our focus to three models based on access to API, cost, successful generation of images, and the accurate portrayal of prompts. As our focus in this study is generating high-quality images that accurately illustrate the prompts, we focused our attention on DALL-E, Craiyon, and DreamStudio. Despite the costly credit system of DALL-E and DreamStudio, the tool produces high-quality images in addition to inpainting, outpainting, and image-to-image editing. We also chose Craiyon for optional cost expenses but high-quality image generation.

Results for general prompts

We tested the narrowed pool of 3 generative AI models with 10 prompts selected from initially 36 prompts to evaluate the quality of images, however, for brevity, we have demonstrated two samples in Table 4. All the AI-powered generator models gave multiple image outputs for a single prompt, out of which the image that portrayed the prompt with the highest technical accuracy was chosen.

For initial testing, the generative AI algorithms DALL-E 2, DreamStudio, and Craiyon were tested in an area that has already been researched extensively, well documented with training data, and not-technically complicated. Since there is already a plethora of existing literature on the generation capabilities of such systems on animals, our group chose an animal, a bunny, for image generation38,39,40. However, in real life, these animals, similar to a nuclear reactor are not just floating in space, they have a surrounding i.e. a field, sand, rocks or grass. As a result, the initial prompt became "High quality image of bunnies in a field". This prompt produced similar results among the three, each with grass and varying bunny colors. Each bunny appears to be accurate, correctly depicting the ears, head, and body shape. DALL-E 2 produced the most realistic image, and this appears to be a cottontail bunny. This generative AI tool produced extremely realistic grass; however, DALL-E 2 only produced one bunny, when asked to produce “bunnies”. DreamStudio produced two bunnies that look realistic. The grass appears to be over-saturated and the bunnies’ coloring appears slightly off and looks a little “cartoonish” (as interpreted by members of the research team); however, still produced a technically accurate result of bunnies. Craiyon produced two bunnies that appear physically accurate. The grass is out-of-focus and does not look as realistic compared to DALL-E 2 and DreamStudio.

Since the generative AI performed well for images of bunnies and surroundings on smaller scale realistic images, a second prompt was generated to determine if accurate scaling could be applied to surroundings on a larger scale (i.e. mountains, water or sand dunes) with more complex phenomena like shadows, and to test if the algorithms could provide a change in art style. The Michigan sand dunes are a well-known large-scale landmark for the researchers of this study, and as a result, a prompt related to these surroundings was chosen as “An oil painting of Michigan sand dunes”. For the second prompt, we generated four image outputs, out of which the image which portrayed the prompt with the highest technical accuracy was chosen. From these tests we observed that, DALL-E 2 created an image that most resembles an oil painting. It has an accurate depiction of sand, sky, and sea grass. This generative AI tool also does an excellent job at shadows. In comparison, DreamStudio did not necessarily create an oil painting, but did create an image resembling qualities of a painting, such as the appearance of brush strokes and watercolor themes. It correctly depicted sand dunes and sea grass. Craiyon produced a realistic image that we would not consider as an oil painting. The shadows appear to be consistent from a light source relative to the left side of the image. Craiyon accurately generated a large body of water in front of the sand dunes, presumably the Great Lakes. It accurately depicts sand dunes with a lot of sea grass by the water. Overall, it appears that these generative AI models produce accurate details for prompts describing the natural environment.

Table 4 Image generated from “general life” topics.

Results for nuclear power prompts—promising performance

As indicated in “Results for general prompts” section, the generative AI models have successfully generated accurate images in response to prompts related to the natural environment. Next, we examine the performance of these models when given prompts related to nuclear energy. We provide nuclear-related prompts to the models and analyze the outcomes to understand their proficiency in generating images in this specific domain.

In this exploration, we tested the 3 generative AI tools against four prompts, the results are shown in Table 5. In our first prompt, we asked all 3 tools to produce an image of a “Person who works in the nuclear industry”. DALL-E 2 produced an image of a male with a mask working at a nuclear power plant, standing next to a single cooling tower. The image appears very detailed and realistic, though showing only a cooling tower and not a reactor building. Additionally, the image does not accurately depict the attire of nuclear plant workers. DreamStudio produced two male workers in work attire and hard hats inside a nuclear power plant. Craiyon created a male in a hard hat, in front of an electrical grid. It did not directly produce anything related to a nuclear power plant, but did display a power transformer. Interestingly, each model only depicted men as nuclear plant workers, thus reproducing existing gender imbalances. It is also notable that DALL-E 2 and DreamStudio generated images of workers who appear to be Caucasian, whereas Craiyon generated an image of an ethnically ambiguous worker.

The second prompt we tested was “Impact of Uranium mining on Indigenous Peoples’ traditional lands”. DALL-E 2 produced an image of dry desert land with a small pond of water nearby, with cut-down trees. This image does not appear to be a Uranium mine, but is a high-quality image. DreamStudio produced a more accurate image of a Uranium mine, depicting rock and dirt excavated at different levels. It also showed animals and tools at the bottom of the image, inferring that these are Indigenous tools. Craiyon produced a technically accurate image of a Uranium mining, depicting different mining levels in a desert environment. Craiyon produced an image that is more of a drawing/painting, and not an image. However, Craiyon generated nothing related to “Indigenous people”. We further improvised this prompt by specifying Navajo instead of indigenous people. Therefore, the prompt was changed to “Impact of Uranium mining on Navajo traditional lands”; in this case, Craiyon and Dreamstudio could capture the landscape of Navajo Nation, indicating improvement in Craiyon performance as the prompt got more specific. Dreamstudio could also include a Uranium mine in the image. However, DALL-E produced an image of dry land, failing to generate both Uranium mine and Navajo land.

The fourth prompt we tested was “Wildlife near a nuclear plant”. DALL-E 2 produced two ducks on the dirt around grass, with two cooling towers in the background. The detail of the ducks and the cooling towers are accurate, and looks realistic. DreamStudio generated a deer next to a cooling tower in long grass; the image looks noisy and grainy. Some features of DreamStudio’s generated image are not detailed, as the sky is not a palette of blues, there are no clouds or other background scenery, and the grass proportionally tall and one dimensional compared to the deer and cooling tower. However, this image still accurately represents the prompt. Next, Craiyon accurately produced two cooling towers; however, it attempted to generate an animal at the top of the smoke clouds. It is also worth noting that the steam exiting each cooling tower is in opposite directions; this is barely possible for steam to be carried in opposite directions by the wind. Despite this error, it was included in successful attempts, as it still accurately portrayed nuclear cooling towers and attempted to create an animal.

The aforementioned prompts were chosen to address possible gender bias, depictions of nature, and so on.

It seems that general nuclear energy prompts produce promising results; however, it was observed that nuclear prompts tend to almost always produce cooling towers. This could be due to the data sets used to train these generative models having a large availability of cooling tower images on the internet in comparison to images of other technical components, which could in turn be a by-product of the secrecy, preservation of intellectual property, and export controls for nuclear energy technology. This suggests that all the explored models associate nuclear energy with cooling towers; however, it also suggests that these generative AI models do not have a thorough understanding of other components of nuclear power and nuclear power plants in areas outside of cooling towers.

Table 5 Promising results from nuclear related prompts.

In the next phase of this work, we went beyond image generation and explored image editing capabilities using inpainting and outpainting functionalities. With the image that DALL-E 2 generated to the prompt “Person works in the nuclear industry”, we used the inpainting prompt “Person near a nuclear power plant in a hazmat suit”. The resulting images produced a man in a hazmat suit in Fig. 8.

Fig. 8
figure 8

DALL-E 2 Inpainting for a nuclear energy prompt. Credit: The images in this figure are generated using DALL-E 2—https://openai.com/policies/row-terms-of-use/.

Next, we used outpainting feature with the DALL-E 2 image from the prompt “Wildlife near a nuclear plant”. This was used to expand the borders of the image on the left side of the image. The outpainting algorithm added additional ducks, while adding a third cooling tower. The resulting images are shown in Fig. 9.

Fig. 9
figure 9

DALL-E 2 Outpainting for a nuclear energy prompt. Credit: The images in this figure are generated using DALL-E 2—https://openai.com/policies/row-terms-of-use/.

Table 6 Poor nuclear results.

Results for nuclear power prompts—poor performance

In “Results for nuclear power prompts—promising performance” section, we examined successful cases of generative AI with nuclear energy prompts. However, models also occasionally generated poor images depending on the prompts as shown in Table 6. The first unsuccessful prompt was “China and nuclear”. DALL-E 2 produced a flag similar in color and pattern to the Chinese flag, and included the atomic nuclear symbol on the flag. DreamStudio produced 2 extremely wide cooling towers, but there is nothing indicative of China in this picture. Craiyon produced another flag similar to the Chinese flag, but has an unusual blue stripe. There is nothing indicative about nuclear in this image. Text to image generative AI models struggled to link countries to nuclear.

Next, we tested the prompt “Display radioactive waste”. DALL-E 2 produced an image of a crate with what our researchers believed to be stones inside the box, with an atomic logo and incomprehensible text on a lid. DreamStudio has boxes with a pattern and text on a yellow background; however, this image did not appear to be related to the prompt. Craiyon has the closest depiction of nuclear waste (even though still very far), of an atomic logo being in a cylindrical container. None of these images are a correct depiction of nuclear waste.

Our third prompt was “Create a functional diagram of a nuclear reactor core”. DALL-E 2 showed a nuclear reactor core from the top down and got the circle shape right. This image had text that is not English, and appears meaningless. The diagram is also not technically accurate. DreamStudio attempted to create a diagram of a reactor core; the words are not legible and the diagram is difficult to see; this is also not correct on a technological level. Craiyon did not create a diagram, and it created a blue light cylinder on a grey base. Overall, none of these images show a correct diagram of a nuclear reactor core.

While general nuclear prompts produced promising results, anything technical or requiring words produced meaningless results. In a failed attempt to create better results, we used Leonardo.AI’s model training with a small data set of 8 images to make a more accurate nuclear diagram. Fig. 10 illustrates the training set as well as the diagrams produced. The diagrams generated by the trained model appeared plausible at first glance; however, upon closer examination numerous issues surfaced. Firstly, the characters displayed on the diagram remained intricate gibberish, and the nuclear fuel rods that should have been situated within the reactor core were absent. Though images containing such technical and specialized content prioritize precise information transmission over creativity, none of those tools satisfied the criteria. It seems that more extensive training and meticulous adjustments are necessary.

Fig. 10
figure 10

Training set and output of various prompts to provide a sketch of a nuclear reactor core. Credit: The images in this figure are generated using Leonardo.AI—https://leonardo.ai/terms-of-service/.

Results for prompt engineering

To test the efficacy of prompt engineering as indicated in Fig. 7, a set of 7 prompts related to nuclear engineering were collected from a set of subjects. Tables 7, 8 and 9 display the prompt engineering results by DALL-E 2, Craiyon, and Dreamstudio models, respectively. Each table contains the original prompt provided by the analyst, the image associated with the original prompt, the modified prompt by prompt engineering, and the modified image associated with that modified prompt.

In the case of DALL-E 2, Table 7 illustrates a combination of promising and unsatisfactory outcomes following prompt engineering. Notably, prompts 1, 2, and 4 related to the control room, spent fuel pool, and fission reaction exhibited considerable improvement, while the others remained inaccurate. Prompt 1, for instance, resulted in a modified image of the control room that closely resembled an actual nuclear control room, however it omitted the nuclear reactor core. Despite prompt 2 omitting nuclear waste and spent fuel, the modified version still portrayed a realistic image of the spent fuel pool, where nuclear waste is temporarily stored after being discharged from the reactor for cooling. Prompt 4, focusing on the fission reaction, displayed atoms splitting into smaller atoms, reflecting the process of fission. However, prompt 4’s results still contained unreadable wording and gibberish language. Prompts 3, 5, 6, and 7 remained considerably distant from reality. For instance, prompt 6 failed to depict the nuclear fuel pellet and nuclear fuel rod in the context of a birthday event.

Tables 8 and 9 depict the outcomes of Crayion and DreamStudio, revealing inferior performance compared to DALL-E 2 across both original and modified prompts. The generated images from these models consistently exhibit unrealistic characteristics. Notably, the only instances of improvement are observed in prompt 1 for Crayion and DreamStudio, associated with the nuclear control room. Additionally, prompt 2 for DreamStudio yields somewhat realistic images of the spent fuel pool, with both the original and modified versions displaying a fair quality.

Table 7 DALL-E 2 results with prompt engineering.
Table 8 Craiyon results with prompt engineering.
Table 9 DreamStudio results with prompt engineering.

Discussion

The main goal of our research focuses on creating realistic and accurate images that could accurately depict the complex sociotechnical nature of nuclear energy systems to assess the reliability of using generative AI models as tools for public engagement in a scientific context. Text-to-image systems appear to successfully depict only the cooling towers in a nuclear plant; they struggle with technical details related to nuclear power plants. For example, as shown in Fig. 10, all models could not produce a diagram of a nuclear reactor, even with some training data templates that depict real nuclear reactor cores. Additionally, we noticed that radioactive waste was portrayed incorrectly by DALL-E 2 and DreamStudio, and only Craiyon depicted a barrel, which is still far from how the radioactive wastes and their storage casks look like. Such outcomes indicate that models have not yet been adequately trained on data related to nuclear energy technologies. Such a conclusion might be also true for other scientific disciplines, as similar outcomes were observed in41 in the energy sector for the design of thermal solar panels and solar heat collectors, where DALL-E 2 was used to visualize possible designs for the combination of power generation and solar water heating. Consequently, the development of a generative AI specialized in a specific energy type such as nuclear or solar necessitates the acquisition of a greater volume of nuclear or solar energy related data. Ensuring sufficient high-quality training data must undoubtedly be incorporated into future work.

Comparing all three models, DALL-E 2 gave the best results with prompt engineering. It was also noticed that DALL-E 2 generated better images when only a small number of subjects were present in the prompt, otherwise, different objects interpolated into each other. For instance, in the case of Prompt 2 in Table 7, optimal results are obtained by removing nuclear fuel from the original prompt and instead describing the spent fuel cooling pool in detail. Further, in the case of Prompt 1, optimal results could be obtained by removing the nuclear reactor core from the prompt. This pattern was observed in all the three models. From the prompt engineering results, it can be observed that generative AI models give better results for the nuclear components that have a substantial number of images present on the internet, such as cooling towers or nuclear reactor control rooms when compared to the prompts referring to subjects which have comparatively less images (e.g., steam generator, reactor core, fuel rod). Further, the results can be improved by giving visual cues to the model in layman’s language, as in the case of cooling fuel in Prompt 2 of Table 7. However, all generative AI models are still not able to comprehend the technical terms, relying on the appearance description provided in most of the cases. It should be noted that among our multidisciplinary team of nuclear engineers, AI, and data scientists, the majority of researchers who chose the prompt and verified its quality had nuclear engineering backgrounds. We conducted a single execution for each model, and from the approximately 3 to 4 images obtained from that run, we selected the images that we deemed to be of the highest quality to include in this paper.

Through this study, we have also identified several common issues that models encounter during image generation. All generative AI tools struggled with accurately drawing human faces. This may be due to the numerous facial expressions and facial variations humans have, which would result in having an extremely large database of human faces in order to accurately portray the human face. As Table 5’s Prompt 1 shows, one can recognize that they are humans, but the faces are off. The man’s right eye is malformed. For Prompt 3 on the same table, eyes were completely overshadowed by the hat.

Crucially, the generative AI tools also perpetuate prevailing biases related to gender and employment within the nuclear energy sector. When prompted to generate images of nuclear plant workers, the models predominantly generated images of caucasian men. While it is true that this particular flaw could be circumvented in public engagement and visioning efforts by specifically prompting the models to generate images of “a diverse workforce” or “a workforce with equal representation of genders and diverse representation of ethnicities”, a user who is not aware of the limitations and biases of the models is unlikely to use these prompts from the outset.

Additionally, as noted above, the models inadequately depict indigenous environments, which have traditionally served as locations for resource extraction and the disposal of nuclear waste by energy industries. Indigenous communities in the Intermountain West have been displaced and impacted by uranium mining as well as the development of nuclear weapons facilities. Given this limitation, generative AI tools in their current form are unlikely to be well-suited for public engagement, and visioning efforts in these communities because the tools are unable to depict the landscapes that would be central to such visioning efforts.

Furthermore, words were presented as nonsense. Despite entering an English prompt, the characters presented in the generated image were intricate symbols rather than alphabetic letters. There could be multiple reasons for this phenomenon, yet the most plausible explanation is the insufficient training of the model in depicting textual content directly as images. In other words, while the machine has acquired the capability to illustrate the entity referred to by the text “nuclear power reactor”, it has not been trained to produce the exact image representation of the text “nuclear power reactor” itself.

Our results were also mirrored across other domains such as ad creation, artificial design fiction, and medicine. In ad creation, “generating people remains the most difficult task that even fine-tuning cannot resolve with sufficient realism”42. Moreover, Ref.43 observed similar issues in unreadable text. In the medical field, GANs have experienced failures in image reconstruction details which can lead to loss of information or the creation of fake non-existent details44.

In this work, it is important to address two key areas that can result in image bias towards nuclear energy, in both a positive and negative light. The first is in terms of prompt generation. It should be noted that among our multidisciplinary team of nuclear engineers, AI, and data scientists, the majority of researchers who chose the prompt and verified its quality had nuclear engineering backgrounds. This majority may inadvertently create biases towards nuclear energy in prompt creation and skew the representation of nuclear energy in generated images. In addition to biases of the overall perception of nuclear energy, our team has created prompts native to their own cultures and experiences. These prompts may reflect the backgrounds of members in this group but could limit diverse perspectives and overlook viewpoints and communities that have different relationships with nuclear energy. This study could benefit from including other disciplines in the prompt creation process such as individuals from social science and humanities domains. These perspectives could enrich the understanding of societal implications of nuclear energy beyond themes of gender and impacts on Indigenous communities which are two of the areas of focus of this study, and potentially uncover further biases in generative AI algorithms. As our team embarks on a newly funded research project building on this work, future research will involve collaboration with social scientists and Indigenous groups who will provide feedback on the AI-generated images.

It is also important to note that the chosen prompts and analysis of the generated output were based on an in-depth knowledge of nuclear reactors. However, the accuracy standards of nuclear engineers for AI-generated nuclear reactors may be higher than and unsuitable to inform the general public about nuclear energy or for policy reasons, which is the purpose of this study. Secondly, an algorithmic bias exists in AI generative algorithms themselves. Generative AI is predominantly trained on English-based media, therefore, propagating biases observed in English-speaking cultures. These biases can affect the outputs of these algorithms and generate representations of nuclear energy that are inaccurate for non-English speaking, or non-internet-using regions of the world.

The authors acknowledge their potential implicit biases regarding nuclear energy, as approximately 75% of the team is either pursuing or holds a degree in nuclear engineering. As a result, some prompts and interpretations may unintentionally reflect personal beliefs. However, the team also includes members from interdisciplinary fields, such as data science and computer science, alongside nuclear engineering. Looking ahead, one of the group’s goals is to collaborate with indigenous communities to inform prompt generation and improve image accuracy. Input from those negatively affected by nuclear power is invaluable and brings insights that our researchers may not have considered. While our technical team is proficient in interpreting nuclear-related diagrams, feedback from indigenous communities offers a fresh perspective for analyzing generative AI models, which is the focus of our next study. The objective of this study is to highlight the gaps in these technologies and pave the way for future improvements. Our future plan also aligns with incorporating sociotechnical imaginaries (STI)3,4 methods by incorporating the perspectives of the public in shaping their energy future alongside generative AI development, which will help alleviate some of the biases that might have been unintentionally introduced by our research team.

AI governance and public engagement

Rather than treating generative AI as an inevitable force that society must adapt to, we must acknowledge its deep entanglement with societal values, norms, and ethics. The way forward requires interdisciplinary collaboration, ethical governance, and inclusive public engagement, enabling society to co-create AI applications that are trustworthy and aligned with public interest45. By embracing these approaches, we can ensure that generative AI serves not just technological progress but also the broader goals of social equity, environmental sustainability, and public trust-especially in critical areas like nuclear energy. In the context of this work, our team envisions that AI governance in nuclear energy applications may include four broader concepts:

  1. 1.

    Interdisciplinary co-creation: Instead of relying on technological determinism, generative AI in nuclear energy should be co-created by experts, AI developers, and the public to reflect diverse societal values and priorities46.

  2. 2.

    Inclusive AI governance: AI governance frameworks must include ethical, social, and normative considerations, with mixed-stakeholder governance to ensure transparency and public trust47. Addressing bias and social equity AI models must be regularly audited to prevent bias and ensure social equity, particularly in how nuclear energy impacts marginalized communities. This is needed so that the AI tools once developed will be viewed as trusted mediums for collective visioning and futuring efforts

  3. 3.

    Policy recommendations: Practical policies should be adaptive, transparent, and involve audits and continuous monitoring to ensure the responsible and unbiased use of AI in nuclear energy48. For the specific case of using AI image generators for public engagement on the topic of clean energy—audits and monitoring could further build trust in these systems while also ensuring that the images they generate while intended to depict the future are also rooted in fact and reality—to the extent possible

  4. 4.

    Public engagement and education: Public engagement should move beyond social acceptance, empowering people to actively shape AI and nuclear energy solutions through educational outreach and co-creation. It is intended that AI image generators could serve as one element of this broader engaged approach of working towards cleaner energy systems

In summary, after this study’s exploration of various generative AI models with a specific focus towards nuclear engineering in both a technical and non-technical sense, we have found that a nuclear-specific generative AI model is needed as current models lack the technical expertise to accurately illustrate nuclear topics besides the stereotypes. Beyond that, we have found that for the generative AI models studied they struggle with producing images with readable words, and human faces despite slight improvements after applying prompt engineering. However, we should emphasize that the concerns we found here are specific to the models we tested even though the pool we tested is still large.

Conclusions

In this study, we explored various generative AI models in search for ones that accurately depict scientific and nuclear energy prompts from both a technical and non-technical perspective. Among 20 tools, we narrowed our focus to DALL-E 2, Craiyon, and DreamStudio for their promising results on general nuclear prompts. Through our exploration, we found that all the models we studied struggle with creating images of technical nuclear objects such as “nuclear reactor core”. Specifically, we found that the models struggle with complex objects and technical terminologies in general. We noticed an overabundance of nuclear cooling towers during the research. While cooling towers are the most noticeable for the general public when it comes to nuclear energy, it does not accurately portray nuclear energy, which further suggests that a nuclear energy-specific generative AI is needed. This could also be true for other energy systems (e.g., renewable).

Prompt Engineering techniques were applied to further optimize the prompt and generate desired images. It was noticed that improved results can be obtained by giving highly specified prompts to the generative model, along with substantial descriptions regarding the visual appearance of the prompt subject. However, improvement was mostly seen in result of prompts having a large number of related images present on the internet or containing common more general terms, like deer grazing near a cooling tower. The model still was unable to comprehend the technical terms related to nuclear engineering or generate images when multiple nuclear objects are present in the prompt. Though these models are not satisfactory as of now, they may be significantly improved if they are trained on a large data set of nuclear-related images. Furthermore, when accompanied by efforts to optimize prompts, the model performance is likely to improve even further.

In light of these findings, our research team’s future works are as follows. It is evident that for a specialized text-to-image generative model for nuclear energy, a greater accumulation of pertinent training data is imperative. The variance in data volume across domains introduces substantial performance disparities. As demonstrated in our study, as shown in Table 4, all three tools generated images of near-perfect quality for the text “Bunny”. In contrast, as evident from Table 6, when it comes to nuclear expertise-related content, they produce perplexing images. Ultimately, increased exposure to certain texts during training allows for the refinement of image generation, indicating that the more exposure, the more accurate the imagery becomes. Consequently, our research team recognizes the necessity for nuclear-centered generative AI development and intends to pursue this as part of our future work. In addition, gender, race, and ethnicity inclusive set of images would reduce the bias these tools carry. Such a specialized tool will then be tested through social experiments with the public to obtain realistic prompts regarding public concerns about nuclear power and clean energy policy.