Introduction

The increasing adoption of Generative AI for automatic code generation has revolutionized software development. Code generation synthesizes code from a user-given description1. Most modern Integrated Development Environments (IDEs) available today are also equipped with facilities that support automatic code generation according to end-user inputs in the form of field names or function headers1. These methods can be based on any established, common pattern, such as setters and getters, refactorizations, or inheritance. However, such methods may be inefficient. For example, the generation of setters and getters can be time-consuming, and after refactoring, they have to be tested intensively to work correctly. Therefore, although these approaches are widely adopted, in certain scenarios they may prove challenging in terms of both development time and code quality. Through the use of technology including machine learning especially Generative AI, developers are able to automatically write large parts of code from overview details speeding up the process of software development and making it less exhaustive2. Automatic code generation systems regard being based on machine learning models, especially deep learning methods and technology, which have led to an accelerated development cycle, minimized human error, and improved productivity1.

However, with the growing complexity and ubiquity of automatic code generation tools comes an increasing security risk. While effective, these tools add new security holes that attackers can exploit when poorly managed3. Automated code generation speeds up the development process at the cost of various cybersecurity issues. These problems are frequently exacerbated by a lack of human oversight, particularly when considering Generative AI models that may produce code based on observed patterns in vast datasets, some of which may be inadvertently insecure or flawed4. However, with the increased usage of such systems, the threat of different cybersecurity vulnerabilities is also on the rise5. Automated code generators, whilst effective, are liable to have security holes that can be exploited by attackers, with potentially disastrous consequences for security breaches, data loss, or software misbehavior6.

Cybersecurity threats in automatic code generation are diverse, including injection attacks, insecure code templates, backdoors, and adversarial perturbation of machine learning models, among others7,8. Those risks are compounded by the complexity and incomprehensibility of AI models, which often work as ‘black boxes’ that are difficult to understand in full-and also hard to understand the implications of what would happen if the code were insecure9. Furthermore, the speed at which code is created can, in some cases, exceed the adoption rate of traditional security and maintainability practices, leading to many software systems that are wide open to attack10.

As organizations increasingly employ AI-based tools to generate code automatically, there is a critical necessity to develop holistic frameworks for the detection and mitigation of cyber risks in these systems. The challenges of automatic code generation and generative AI are largely ignored by traditional cybersecurity solutions, which are designed to address challenges in conventional software development paradigms.

Research problem

Given the above concerns, this paper proposes the Hybrid Artificial Neural Network (ANN)—Interpretive Structural Modeling (ISM) Framework, designed to mitigate cybersecurity risks in an Automatic code generation environment. The hybrid model employs the predictive strength of ANN and the relationship analysis by ISM to identify, evaluate, and control the risks holistically and proactively. ANNs are known for their capability to discover regularities and potential security vulnerabilities by analyzing large amounts of historical data, including known vulnerabilities, attack surfaces, and known secure coding patterns. In addition to ANN, ISM is a structured, hierarchical model that distinguishable risk and shows their interconnections that giving a precise and understandable model of cybersecurity vulnerabilities and riddance policies. Integrating two such methods—ANN for prediction and ISM to structure the analysis–represents a significant advance in improving the security of automated code generators.

Significance of the study

The innovative Hybrid ANN-ISM Framework has several implications. First, we present a new way to combine AI-based risk prediction with a structured risk management model, providing a more systematic and understandable solution for the automatic code generation. Second, it focuses on important cybersecurity vulnerabilities in generative AI, where, to date, relatively minimal research has been conducted on its unique security considerations. The framework is also general and can be applied to different applications and industries, guaranteeing its suitability for protecting several AI-based code generators. This paper aims to bridge this gap by proposing a framework that capitalizes upon the strengths of ANN and ISM to develop a hybrid approach for mitigating cybersecurity risks in automatic code generation. This project aims to bolster the security of AI developer tools, root out their vulnerabilities, and make it safer to deploy AI-powered systems in real-world environments.

Objectives of the study

The primary objectives of this paper are:

  • To propose a novel Hybrid ANN-ISM Framework that combines the strengths of ANN and ISM to mitigate cybersecurity risks associated with automatic code generation.

  • To evaluate the effectiveness of the proposed framework through a case study, assessing its ability to address common cybersecurity risks such as injection attacks, insecure code templates, backdoors, and insufficient input validation.

  • To provide an assessment of the proposed framework’s applicability in real-world scenarios, we analyze the maturity levels of generative AI practices within automatic code generation tools and identify areas for improvement.

  • To enrich the literature in cybersecurity of AI-based software development by showing the effectiveness of a combined predictive and structural model to alleviate security risks.

Structure of the paper

The structure of the paper is organized as follows:

  • Section “Related work” presents a review of the related work in the field of cybersecurity in automatic code generation, focusing on existing methods, frameworks, and the role of generative AI in software security.

  • Section “Research methodology” provides a detailed explanation of the research methodology and components of the Hybrid ANN-ISM Framework.

  • Section “Results and analysis” discusses the results detailing the hierarchical structure of cybersecurity risks and generative AI mitigation practices.

  • Section “Framework evaluation” presents the evaluation of the Hybrid ANN-ISM Framework for mitigating cybersecurity risks in automatic code generation utilizing generative AI practices.

  • Section “Implications of the study” presents the implications of the study.

  • Section “Limitations of the study” presents the study’s limitations.

  • Section “Conclusion and future research direction” presents the conclusion and direction for future research.

By considering cybersecurity issues within automatic code generation from Generative AI, our work paves the way for future research and development of security-aware development tools within AI, and such a result should further promote the safe practice of AIs in the software industry.

Related work

As software development increasingly becomes automated, the need for addressing cybersecurity risks has gained significant attention. Automatic code generation tools powered by Generative AI and other machine learning techniques are designed to streamline the development process. These tools have several benefits: speed, efficiency, and consistency. Nevertheless, these tools pose new security threats, so there is an increasing number of studies on discovering and mitigating the threats. This section contains an overview of the methods and frameworks on the state of the art and the role of Generative AI to foster software security in automatic code generation.

Before the advances of AI and machine learning-based approaches, code generation was typically based on pre-existing templates, libraries, and pattern-based approaches. Even if these tools fell short in many aspects, they usually included naive protection mechanisms like input validation and crude error handling. Static code analysis tools such as SonarQube and Checkmarx were created to automate the process of evaluating source code for potential security weaknesses, helping developers identify issues within produced code (e.g., buffer overflow, SQL injection, and cross-site scripting (XSS) vulnerabilities)11,12. These tools provide static security vulnerability analysis by analyzing the codebase to discover common vulnerabilities without determining their execution; however, not all vulnerabilities raised by the creation of code automatic tools, even if they have not detected dynamic flaws. In addition, manual code reviews and penetration testing have been used to discover vulnerabilities in the generated code9. While these approaches are effective in pinpointing security vulnerabilities, they are time-consuming and not practical for large-scale projects, particularly for those relying on generated code, as procedure-based testing quickly becomes overwhelmed by the sheer amount of code9.

As the limitations of traditional approaches became obvious, new automatic methods were developed. Work has concentrated on seamlessly incorporating secure coding practices and automatic vulnerability checking within the code generation process. For instance, secure coding templates can be embedded in the automatic generation framework. Therefore, with templates, we can trust that the code will be written based on best security practices, like sanitizing input and handling sensitive data. Products like Secure Code Warrior offer manageable code snippets with this fixer that automatically replaces insecure structures13,14,15. Automated penetration testing tools have also been improved to match with generated source code. Solution providers, Veracode and Snyk, for example, have advanced their toolsets to also automatically scan codebases for bugs generated from auto-generated code and third-party dependencies16,17,18,19,20. In typical CI/CD flows, these scanners are integrated into the flow itself, guaranteeing that security checks are executed at each stage of development.

The arrival of Generative AI has changed the game in automatic code generation with new abilities not just to generate, but also to protect the code. Generative AI systems, in particular those based on deep learning, such as transforming models (e.g., OpenAI’s GPT and Google’s BERT), have proven to be highly efficient at generating code that performs a function based on high‐level user inputs21,22. On the one hand, it has made the software development process much faster; on the other hand, the utility of generative models includes certain new security concerns to be aware of4,23. One of the significant issues with code generated by AI is that it is inherently opaque. Black-box models like deep neural networks and large language models (LLMs) of the code in code generation risk generating exploitative output without recognizing how individual vulnerabilities or design flaws are being injected into the code24. This lack of interpretability makes it challenging for developers to consider if the AI-created code follows the secure coding principles or contains concealed vulnerabilities, including insecure API invocations and vulnerable data manipulation logic, among others25.

New research has started tackling this issue using AI models for security risk detection. For instance, code-analysis tools based on AI have been constructed to audit generated code for security vulnerabilities in real time10. In such a related work, code generation made by AI models can be identified by using large code bases to learn patterns for flaws in code, and for that, several models such as CodeBERT and GNNs have been trained26. These solutions enable AI-generated code to be automatically screened and flagged for possible issues before the applications are deployed, helping to minimize the chances of security gaps. Furthermore, Adversarial Machine Learning has also been investigated for the security of AI-supported code generation27. Generative models, like any machine learning system, are exposed to adversarial attacks. Adversarial attacks that involve adding small perturbations in the training data can modify the generated code effectively to inject subtle vulnerabilities that could bypass conventional security checks28. To combat this, efforts are now being made to train generative models and adversarial defenses, improving their resilience against such attacks. Adversarial training addresses the generation of secure code and the discovery of attacking code at the generation phase by training the model with adversarial inputs29.

Hybrid approaches that leverage traditional security technologies with the power of Generative AI have been quite popular in recent years10,21,30,31. For instance Hybrid ANN-ISM Model (proposed in this paper) combines the Artificial Neural Network (ANN) to predict the security risks and Interpretive Structural Modeling (ISM) for undergoing the structured analysis and risk analysis, and mitigation. Integrating AI-based prediction models and systematic risk assessment, the hybrid approach can provide a powerful alternative to code security generation. Other hybrid approaches have been proposed instead of combining static and dynamic analysis23. For instance, using an AI solution to static code analysis with a dynamic application security testing (DAST) tool means that the code generated is subjected to multi-level scrutiny both pre-deployment and real-time32. This layered combination provides the benefit that vulnerabilities overlooked in one layer are redundant with another33,34.

The continuous advances of AI-aided tools and frameworks for code generation security have outlined several prospective research directions. One promising approach is to leverage reinforcement learning to iteratively enhance the security of generated code on the fly by dynamically adjusting security protection at runtime according to the real-world deployment feedback. In addition, federated learning would be a means to create decentralized AI models, which can enhance defensive security and privacy further without having access to sensitive data in code generation. The increasing interest in Explainable AI (XAI) in the code generation tools domain is significant. Research in the area is moving towards making these AI-engineered code generation tools more interpretable, so developers can understand why specific lines of code were introduced and which security decisions were being made. This will allow more trustworthy use of these tools in production situations where security and transparency matter.

Although Generative AI enables this new kind of automation, it will also add significant cybersecurity risks that must be addressed. Traditional security techniques have been grafted onto the impact of “code that writes code.” Still, emerging AI-based systems are so complex and opaque that entirely new security paradigms are needed. The effect, generative role of AI in software security is both transformative and daunting, and future research in hybrid solutions, adversarial resiliency, and explainability will be crucial to ensure that these technologies can be used safely and securely in practice.

While advanced AI-based models such as CodeBERT and Codex exhibit remarkable capabilities in code prediction and generation, they provide limited transparency and structural understanding of how different cybersecurity risks interact. In contrast, conventional hybrid methods offer valuable interpretive and relational insights through causal or dependency mapping, yet they typically lack the ability to quantitatively evaluate or forecast the magnitude of such risks. This disparity highlights the necessity for a framework that integrates both predictive analytics and structural reasoning. The proposed ANN–ISM hybrid model fulfills this need by combining the learning and prediction strengths of ANN with the hierarchical analysis power of ISM, enabling it to both quantify cybersecurity risks and elucidate their interconnections. Consequently, this approach provides a more comprehensive and interpretable solution for managing cybersecurity threats within Generative AI–driven code generation environments.

Research methodology

In this study, we follow a comprehensive six-phase approach (see Fig. 1) to verify and validate our proposed Hybrid ANN-ISM Framework to reduce the cybersecurity risks in automated code generation. The first phase will comprise of multivocal literature review (MLR) bringing in perspectives from various sources of knowledge, building a strong base for the study. Phase 2 is a field experiment (online questionnaire survey), in which we intend to collect the opinions from practitioners to understand some problems and points of view about the matter. The third phase is an expert panel review to optimize the draft framework through their collective professional wisdom. In the fourth stage, a model is proposed for predicting cybersecurity risks using an Artificial Neural Network (ANN). The fifth stage uses ISM for deeper analysis and structuring the relationships among risk factors. In the end, in the sixth phase, a case study is applied to assess if the proposed approach is achievable and efficient in the real situation. The systematic process guarantees a comprehensive and consistent investigation of the framework capabilities to mitigate cybersecurity risk in automatic code generation.

Fig. 1
figure 1

Research flow framework.

Phase 1: multivocal literature review (MLR)

Multivocal literature review (MLR) is a comprehensive and systematic literature review from more than one perspective, voice, and source35,36,37. It represents a spectrum of perspectives, approaches, and results across a field. For this paper, an MLR would entail accessing information from various sources, including peer-reviewed papers, conference papers, industry reports, white papers, and expert opinions. The MLR would focus on cybersecurity risks and Generative AI practices related to automated code generation.

Here are the specific steps of this paper to perform an MLR35,38:

Defining the research questions and scope

  • Establish key research questions: The first step in MLR is to define the principal questions of the study. Here, the main research questions are:

    • What are the primary cybersecurity risks associated with automatic code generation?

    • What best Generative AI practices and strategies we should adopt to mitigate these risks?

  • Determine the scope: This will involve deciding on the boundaries of the review, by defining the special generative AI technologies (for example, input validation and sanitization, GANs, etc.) and the scope of the particular cybersecurity risks (possibly, injection attacks, code quality and logic errors, backdoors and malicious code).

Searching for sources

Find sources: We look for a diverse range of sources:

  • Academic references: Papers on cybersecurity, automatic code generation, AI, and Generative Models from high-impact journals and conferences, such as:

    • IEEE Transactions on Cybersecurity

    • Journal of Experimental and Theoretical Artificial Intelligence (JETAI)

    • ACM Computing Surveys

    • International Journal of Information Security

    • Security and Privacy: (Wiley).

  • Industry reports: Announcements from cybersecurity firms, technology companies, and research institutions. Research papers, reports, and white papers from cybersecurity companies and think-tanks, and organizations like:

    • Gartner

    • McKinsey and Company

    • OWASP (Open Web Application Security Project)

    • ISACA (Information Systems Audit; Control Association)

    • National Institute of Standards and Technology (NIST)

  • Government and regulatory source: Documents from government departments or standards companies, such as:

    • EU GDPR Reports

    • U.S. Cybersecurity and Infrastructure Security Agency (CISA) Advisories

  • Employ databases: Widely used academic databases such as:

    • Google Scholar, IEEE Xplore, SpringerLink, ACM, Scopus, etc.

  • Search criteria: query syntax: in specific search term:

    • “cybersecurity risks in automatic code generation”, “Generative AI practices”, “generative models and vulnerabilities”, “risk mitigation in automatic code generation”

The PRISMA Flowchart of the final sample size is shown in Fig. 2.

Fig. 2
figure 2

PRISMA flowchart for final sample size.

Screening and selecting sources

  • First-level screening: We screen abstracts and titles to include relevant and reliable sources.

  • Inclusion criteria:

    • Literature regarding cybersecurity and AI in the realm of automatic code generation, papers that present solutions to mitigate the identified risks.

    • Recent research papers—Within the past 5–10 years.

    • Consider both cybersecurity threats and AI-specific mitigations.

    • Studies in high-quality peer-reviewed journals and conference proceedings.

    • Updates from reputable cybersecurity firms.

    • Resources and references about generative AI techniques, approaches, models, practices, etc.

  • Exclusion criteria:

    • Non-relevant content about cybersecurity risks or generative AI in automatic code generation.

    • Papers over 10 years old (unless they are seminal).

    • Non-peer-reviewed sources or opinion pieces.

Data extraction and synthesis

We extract the following information from the selected papers:

  • Cybersecurity risks found: What are the primary cybersecurity risks mentioned in association with code automation?

  • Mitigation approaches: What are the proposed generative AI practices, methods, or technologies to mitigate risks?

  • Emerging trends: We seek new or novel approaches to secure automated code generation against cybersecurity threats.

  • Challenges and gaps: We discuss areas in which our literature review highlighted potential gaps in, or limitations of, the current literature.

  • Categorize results: We classify our findings under several categories, such as Cybersecurity risks (e.g., data poisoning, adversarial attacks), Model robustness, and security protocols—best practices and recommendations in generative AI for safeguarding automated code generation.

Analysis and thematic clustering

Identify themes and variations within themes: After organizing the data, we looked for themes and variations within themes across sources. For example:

  • Security threats in automatic code generation.

  • Threats of the abuse of generative AI in generating deepfakes or counterfeit content.

  • Proactive strategies for mitigating bias include adversarial training, model verification, and an AI ethical framework.

  • Cross-source comparisons: We contrasted sources’ conclusions about cybersecurity risks and mitigation strategies.

Synthesizing results and presenting findings

  • Provide a holistic view: We consolidated the most-mentioned cybersecurity risks and the generative AI mitigations practices found across all the sources. Explain the significance of these observations to cybersecurity issues and solutions in generative AI.

  • Draw attention to research gaps: We identify unexplored and under-researched topics useful for incoming research. It may be more evidence, new mitigation approaches, or a joint academia-industry partnership.

  • Discussion of limitations: We discussed limitations of the current review (e.g., potential bias of sources, restricted availability of databases, or absence of research).

Formulating implications and recommendations

  • Develop implications: Drawing from the synthesis, we suggest practical implications for research and practice. For example, it might indicate where additional work is required before we can safely rely on automatic code generation.

  • Practice implications: We provide practical recommendations for managing cybersecurity risks in automatic code generation, such as introducing specific security standards, regulatory frameworks, or audits of generative AI.

Writing and structuring the literature review

  • Introduction: We present a discussion about cybersecurity risks and generative AI practices importance and limitations in automatic code generation.

  • Methods: We describe how the MLR was performed and why various voices and sources are included.

  • Main body: We present the results under three main themes (exposure, mitigation, and challenges).

  • Conclusions: We conclude by summarizing the main findings, identifying research gaps, and making recommendations for future research and practice.

By taking these steps, the MLR contributes to a well-informed and balanced discussion of cybersecurity risks and generative AI methods for addressing them, in which various voices and perspectives are heard. This will be beneficial both academically and by transferring knowledge between academia, industry, and practical application.

Phase 2: online questionnaire survey

The second step of this research was to design a questionnaire survey, and several essential elements were considered to develop a comprehensive and valuable research. The main objective of the survey is to find out what sorts of cybersecurity risks exist in automatic code generation, and as a secondary task was to understand how generative AI can be applied to address these risks. The following steps were followed in this survey39,40,41,42,43:

  • First of all, we set a target audience for this survey. We choose the cybersecurity researchers, the software developers, the AI researchers, and users and developers of automatic programming tools. The participants have some experience with AI, code generation, and cybersecurity. This guarantees the answers are the results of knowledge and applicable to the study’s objectives. The final sample size of participants in this survey is 70.

  • The next stage was the construction of the questionnaire itself. The survey covers various categories of questions, such as demographic, managing risk in cybersecurity identification questions, artificial intelligence practices for mitigating risk questions, and technology and tools questions in the field. Demographic background questions, how to get basic information about the sample, like their role in the industry, and their working experience. For example, we inquire:

    • How would you describe your role in the organization?”

    • How many years of experience in software development/AI/cybersecurity?

  • Furthermore, in our questionnaire, questions on the identification of cybersecurity risk were evaluate the participant’s knowledge of typical (security) risks related to automatic code generation, e.g., code injection, data leakage, and insecure APIs. e.g., a question of concern is,

    • What are the most common cybersecurity risks that exist in automatic code generation?

  • Fig. 3 presents descriptive statistics of the respondents of the questionnaire. AI practices for risk reduction are indispensable for anyone considering how generative AI is employed to counter threats. We included further questions such as,

    • Have you heard of any AI-driven methods for finding vulnerabilities in code?

    • What are your thoughts on how generative AI can help counter the cybersecurity challenges associated with automatic code generation?

  • In addition, technology and tooling-related questions were applied to determine what platforms are in use in automatic code generation and whether they incorporate any AI-related security capabilities.

  • The survey format consisted of an introduction, which was a short profile of the study giving a précis of the survey’s intent, addressing how the responses would be used, and the amount of time to complete the instrument. Also included is an informed consent with a statement explaining that the information provided is confidential and that participants’ identities will remain anonymous. The survey was structured according to the same topics: demographics, cybersecurity risks, AI practices, and technology/tools. At the end, participants were thanked, and if relevant, we mentioned any subsequent events (e.g., a presentation), such as the sharing of results.

  • Then, finding the right survey tool is also significant. We use online tools like Google Forms to produce and circulate the survey. This product enables us to develop, distribute, and analyze surveys to fit different levels of functionality. After choosing the survey instrument, a pilot test should be performed with a limited number of subjects. This trial was used to debug question clarity problems, survey length, or tool usability, and to ensure that the survey functions properly once disseminated more broadly.

  • After the pilot testing was undertaken and any problems rectified, the survey was distributed to the larger population of interest. Here we resort to professional networks such as LinkedIn, GitHub, stack-overflow or AI/cybersecurity forums to facilitate reaching the appropriate responders. We also commit to a specific survey completion deadline so that you can rest assured that we will collect the data when we should.

  • Seizing the data first and then analyzing it. We tracked the responses to check that everything is going well during the survey. Subsequently, the data were examined and read across for trends, patterns, and nuances. Quantitative data were analyzed using statistical packages such as SPSS and Excel, while the open-ended responses were analyzed through thematic analysis to identify themes. This enables us to interpret the survey findings.

  • Lastly, the results were presented systematically in this paper. The reports concentrate on the top cybersecurity concerns of the respondents and offer insight into the generative AI methods that are most frequently employed to counter those concerns. Ethical issues were also taken care of by maintaining the privacy and confidentiality of the respondents. The study follows the ethical principles of research involving human subjects, including obtaining informed consent and maintaining confidentiality.

Fig. 3
figure 3

Demographic details of survey experts.

Phase 3: expert panel review

An expert panel review was performed to assess the research presented in this paper. The panel was comprised of 19 experts from multiple domains such as cybersecurity, AI, software development, and automated code generation tools. They were from several sectors: academia, industry, and research labs, with a mix of professionals having experience in:

  • Cybersecurity and risk management practice requirements

  • AI-driven products, particularly those using generative AI technologies

  • Experience in software development, preferably automatic code generation

  • Ethical implications of cybersecurity and AI

Between them, these individuals have over a decade of experience, many of whom have advanced degrees and have held professional leadership positions in their particular area. The study design is a rigorous process, involving sequential Delphi rounds. During any given round, the MLR is scrutinized by experts, who critically appraise the MLR and results with extensive feedback on the research design and areas for potential improvements. Relevant input from the expert panel is thoroughly integrated into the research, which sharpens the research questions and provides a more specific scope for the ANN-ISM framework.

The experts assessed the cybersecurity risks on a holistic risk scale. Penalty for perceived lower (~ 5%) and medium importance (~ 45–50%) risks was set to 1 and 10, respectively. Other risk scores were scored in 5 percent units, creating a stepwise scale for responses. These expert judgements are adopted to construct pairwise matrices, which encapsulate the interrelations of different cybersecurity risks in automatic code generation, which are presented in Table 1.

Table 1 Cybersecurity risks in automatic code generation.

To ensure the reliability and validity of the research model, ANN and ISM were used as two other analytical methods. Such analyses enable a more in-depth understanding of the results, which helps to illustrate the face validity of the findings and ultimately adds to the strength of the research process.

Phase 4: artificial neural network (ANN)

The ANN process was applied in the fourth stage of this research. It is also easy to adapt to new data sets. One of the advantages of the ANN is that it can work with incomplete or missing data inputs44. ANN’s predictions, as a rule, are superior to those of other techniques such as SEM, multiple linear regression, MDA, and binary logistic regression. Inductive Structural Modeling (ISM) is commonly employed to identify the implications of predictors on a predictor variable. Still, linear techniques like ISM have limitations in capturing the non-linear process of human decision-making by neglecting higher-level relations45,46. ANN, being a well-known AI model, can overcome this drawback by mimicking decision-making scenarios and nonlinear relationships, as emphasized by Leong et al.45. ANN’s multi-layer perceptron/structure simulates the relationship between inputs and outputs, similar to how the human brain operates. An important aspect of ANN is its capability to model the nonlinear and non-compensatory links between the attributes47.

In summary, ANN models are more accurate than the classical linear approaches and provide remarkable flexibility and generosity 64. However, ANN is inappropriate for attributive analysis or hypothesis testing45,48. To address this problem, a two-stage approach based on the integration of ISM with ANN has been proposed.

ANN training

When an ANN is trained, we model intrinsic relationships between inputs and outputs by adjusting its internal weights, and the input/output pairs are shown as:

$${\text{S}} = \left( {{\text{d1}},{\text{ x1}}} \right),\left( {{\text{d2}},{\text{ x2}}} \right), \ldots ,\left( {{\text{dNi}},{\text{ xNi}}} \right)$$
(1)

The input parameters, referred to as xi, and corresponding output responses, denoted di, comprise a random sample. These data sets display the inherent non-linear correspondence between inputs and outputs. The objective is to build an ANN model capable of learning this kind of invariant relationship independently. Typically, the output of the ANN is written as wijyi + bi

$${\text{y }} = {\text{y}}\left( {{\text{x}},{\text{w}}} \right)$$
(2)

where y is the ANN output, x is the input parameters, and w indicates unknown weights. The optimal weights can be found by solving an optimization problem, which minimizes the disparity between the predicted output and the real label. This optimization can be formulated as:

$${\text{w}} * = {\text{min}} \times {\text{ET}} = {\text{min }} \times {\text{i}}\left| {\left| {{\text{ di}} - \gamma \left( {{\text{xi}},{\text{w}}} \right)} \right|} \right|$$
(3)

where ET represents the sample standard error. There are many ways to approach this problem, and the most widely known is backpropagation, proposed by Hertz et al.49. This is a method that adjusts the weights of the network by computing the estimated gradient of the error function with respect to the weights, which leads to better predictions:

$$\omega_{{{\text{next}}}} = \omega_{{{\text{now}}}} - \eta \alpha {\text{E}}\tau /\alpha \omega$$
(4)

Hertz et al.49 set the learning rate to h. Initially, the weights are chosen randomly, and the algorithm is repeated until the optimization condition of Eq. (3) is satisfied. Weights and biases are updated during this process, minimizing the mean squared error and allowing the model to attain the target accuracy.

The weights (Wi) and biases (bi) are adjusted until the model obtains the desired accuracy. Alnaizy et al.50 provide a calibration procedure denoted:

$${\text{Vi }} = \sum\limits_{i = 1}^{n} {{\text{wijyi }} + {\text{ bi }}} { }$$
(5)

The bias bi adjusts the weighted sum of inputs. A transfer or an activation function is next used on the sum Vi. This transformation produces the:

$${\text{Zi}} = {\text{f}}\left( {{\text{Vi}}} \right)$$
(6)

Performance of ANN Training

The performance of the ANN is evaluated using the Root Mean Squared Error (RMSE), the R-squared (R2), and the Average Absolute Deviation (AAD), expressed as:

$${\text{RMSE }} = { }\left[ {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Yi - Yid} \right)^{2} } \right]^{0} { }.5$$
(7)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Yi - Yid} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Yi - Yin} \right)^{{2{ }}} }}$$
(8)
$$AAD = \left[ {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {Yi - Yid} \right)}}{Yid}} \right]{*}100$$
(9)

where, Yid is the observed data; Yi is the predicted data; Ym is the median of observed data, n is the total number of data.

Phase 5: interpretive structural modeling (ISM)

The ISM approach was applied in the fifth stage to classify and rank the identified cybersecurity risks in automatic code generation. The concept of the ISM method, which is explained in51, was presented to analyze and understand complex relationships among systems and subsystems. By organizing a hierarchy, this method contributes to the acquisition of ability by organizing the variations and the directions of various elements. Further, ISM can well model the relationships between visual and structured language52. This method is compelling in investigating complex multivariate interactions53,54,55. This approach has been used in many studies better to understand complex systems56,57,58,59,60,61,62,63. Figure 1 shows how ISM can be used keeping in view in mind to map and classify cybersecurity risks in automatic code generation.

Phase 6: development, implementation, and validation of the hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation with the use of generative AI practices

In the final phase of this research, all the findings of phases 1–5 were merged to develop the hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI practices. The proposed model was then implemented in an organization and was validated through a case study. Further details are presented in Sect. “Framework evaluation”.

Results and analysis

Cybersecurity risks in automatic code generation

It is essential to detect cybersecurity risks in automated code generation and to implement them with AI approaches; otherwise AI AI-based tools of software development may increase the risk of introducing hidden, and thus potentially exploitable, vulnerabilities for attackers. Unguarded automated code generation can result in code that has built-in vulnerabilities, security holes, or dependencies on the past or unsafe libraries, potentially leading to application integrity, confidentiality, and availability being compromised. Early exposure to these risks can allow organizations to adopt AI-driven security activities, like secure coding guidelines, adversarial testing, and continuous validation, to work toward elevated security standards in the generated code. It not only keeps the software from possible security leaks but also reduces the adverse effects of long-term cybersecurity threats. It also offers a secure and stable environment for using AI technologies in software development. Table 1 presents various cybersecurity risks identified through the literature review and survey.

Statistical analysis of cybersecurity risks in automatic code generation as identified through MLR on real-world study

To conduct a sound statistical analysis based on the data of MLR and real-world study in Fig. 4, we examine descriptive statistics (mean, standard deviation), correlation, and comparison of means:

  • Mean = \(\mu = \frac{{\sum X_{i} }}{n}\)

    • MLR Mean = 74.43%

    • Real World Study Mean = 76.0%

  • Standard Deviation = \(\sigma = \sqrt {\frac{{\sum \left( {X_{i} - {\upmu }} \right)^{2} }}{n}}\)

    • MLR Standard Deviation = 9.93%

    • Real World Study Standard Deviation = 8.22%

  • Pearson Correlation Coefficient = \({\text{r}} = \frac{{n \sum X_{i} Y_{i} - \sum X_{i} \sum Y_{i} }}{{\sqrt {[n\sum X_{i}^{2} - \left( {\sum X_{i} } \right)^{2} \left] \right[n\sum Y_{i}^{2} - \left( {\sum Y_{i} } \right)^{2} ]} }}\)

    • MLR and Real World Study Correlation (r) = 0.94

    • The observed correlation is very positive (circa 1), which indicates that the impacts of the Multivocal Literature Review and the Real World Study go in the same direction.

  • T-Test for Comparing Means: We can use an independent t-test to compare whether the means of the two samples are genuinely different. This test allows us to determine whether it is a coincidence that the impact rates are different between the two sets.

    • T-Test Formula (for two independent samples) = \({\text{t }} = \frac{{\mu_{1} - \mu_{2} }}{{\sqrt {\frac{{S_{1}^{2} }}{{n_{1} }} + \frac{{S_{2}^{2} }}{{n_{2} }}} }}\)

    • T-Statistic = -0.44

    • P-Value = 0.66

    • The p-value is 0.66, higher than the significance level (0.05). Hence, we do not reject the null hypothesis and that there is no statistical different between the means of the two samples.

Fig. 4
figure 4

Impact percentages of cybersecurity risks identified through MLR and survey in automatic code generation.

The means of the two sets are similar to each other with a slight variance (74.43% vs 76.00%), which is not statistically significant according to a t-test. The two sets are highly correlated (0.94), implying the similarity in the trend of MLR and the real-world study. These considerations indicate that the two datasets are highly correlated, but the absolute impact percentages are not radically different.

Analysis of cybersecurity risks in automatic code generation based on software development organization sizes

Figure 5 presents the effect of some cybersecurity threats on automatic code generation for three company sizes: small, medium, and large. All cybersecurity risks (CRs) are depicted on the x-axis, while vertical bars indicate the impact percentage for small, medium, and large companies. This makes it possible to compare how these risks manifest against various sizes of software development organizations.

Fig. 5
figure 5

Cybersecurity risks in automatic code generation and its impact on company sizes.

Looking at overall trends, we can see that small firms (i.e., the blue bars) will have the highest impact percentages across various cybersecurity risks. This may indicate that smaller companies face relatively higher cybersecurity challenges because of having fewer available resources, weaker security infrastructures, and a smaller number of cybersecurity professionals. The middle of the road is medium companies in the green bars, which can still suffer; it is moderate because they have some security in place, but are still at risk with their increased complexity and scale. By contrast, larger firms, depicted in red, have much lower share percentages. In other words, although the employees of large companies still struggle with cybersecurity risks, the organization is usually stronger, has more resources, and better-established security procedures to manage those risks.

Some particular risks offer more context into how the ill effects are distributed across the sizes of firms. For instance, Injection Attacks (CR1) are demonstrated to have a high rate of influence on all company sizes, mostly on small companies. This is not of much surprise as, for small companies, they will not have sophisticated security mechanisms nor a significant amount of code review to protect against this kind of attack. Likewise, Code Quality and Logic Errors (CR2) achieve the highest impact on small companies, possibly because they have less formalized methods of coding and quality assurance. This risk can be minimized by bigger companies having more vigorous testing and reviewing processes in place. Backdoors and Malicious Code (CR3) also significantly impact small organizations, which might indicate that small companies have poor control over third-party libraries. Small companies demonstrate an intense vulnerability for Insufficient Input Validation (CR5), which reflects the impact of having fewer developers and the use of automatic generation tools that may not have been deeply tested against edge cases.

Other risks, e.g., Weak Authentication and Authorization Mechanisms (CR6), diminish impact with increasing company size. Big companies have more robust ways to authenticate security for their systems, while smaller companies may lack those protections. A Lack of Encryption or Insecure Data Handling (CR7) is also risky for small companies, as there is a higher risk due to an insufficient security budget.

A general trend can be observed from the entire data set—with larger-sized companies, the percentage figures of impact decrease. It indicates the perception that the bigger the company, the better prepared they are to manage cyber risks due to better security defences, dedicated resources, and in-built processes.

Small companies were found to be the most vulnerable to security problems that arise with automatic code generation: they lack funds, knowledge, and protective mechanisms. Mid-sized companies have made modest progress from a risk-management standpoint but remain highly exposed. The larger companies, on the other hand, usually have less sweeping effects overall, due to bigger security tools budgets, more well-prepared personnel, and grander schemes of security procedures. However, it does not mean they are very safe from risks, especially those related to how complex they are. The graph effectively demonstrates corporate size and the correlation between investments in security infrastructure to manage cybersecurity risks related to automatic code generation.

Analysis of cybersecurity risks in automatic code generation based on software development organization continents

Figure 6 is a line chart depicting the global landscape of cybersecurity risks in automatic code generation by continent (North America, Europe, Asia, Africa, South America, and Oceania). Each line indicates the percentage impact of each cyber risk, with the y-axis being the severity of the dangers across multiple regions.

Fig. 6
figure 6

Analysis of cybersecurity risks in automatic code generation based on various continents.

Asia has the highest exposure to cybersecurity risk vectors, including Injection Attacks, Backdoors, Malicious Code, and Adversarial Attacks on AI Models. It implies that there is much to do to tackle these risk vectors. This increased impact is the result of the rapid implementation of AI without well-established cybersecurity defenses; in addition, there is a vast technology gap between the various countries in Asia.

Africa also records high impact rates of many risks, notably Insufficient Input Validation, Weak Authentication, and Reusability of Vulnerable Code. The high hit rate is understandable because the level of investment in security infrastructure is not high, and many African countries use outdated systems and technologies. On the other hand, there is another rather significant impact of Adversarial Attacks on AI Models, which reflects a potentially increasing difficulty of securing AI-based systems.

North America and Europe have lower impact ratios due to the stronger cybersecurity practices, laws, and tech deployed in these regions. As the chart demonstrates, these regions can better manage risks like Weak Authentication, No Encryption, and Privacy Issues thanks to a mature cybersecurity posture, as seen with GDPR in Europe and regulatory authorities in North America.

In South America, there are only moderate percentages of impact, though some threats are relatively higher than others, like Privacy Issues and Data Leakage effects. This indicates that even as certain South American countries progress regarding cybersecurity, data protection, and systems integration remain issues.

Oceania, which takes in countries such as Australia and New Zealand, has modest effects like those in South America. The region is one of the most advanced ones in the world, but it is still vulnerable to Insecure Integration with Other Systems and Insufficient Logging and Monitoring. These risks are a reminder that even affluent areas can struggle to cover complex software systems adequately or to perform complete security monitoring effectively.

Figure 5 demonstrates the differing levels of cybersecurity risk exposure and measures for mitigation globally. Some regions do better in comparison to others in terms of managing cybersecurity risks, with North America and Europe seeing smaller risks while Asia and Africa face greater risks, mainly due to obsolete infrastructure, resource constraints, and rapid adoption of new technologies. This visualization clearly emphasizes the importance of securing the auto code generation and AI-based technology and global collaboration to build a better cybersecurity infrastructure, particularly in the EMDEs, to counter the evolving threat landscape.

Generative AI practices and tools for addressing cybersecurity risks in automatic code generation

Table 2 presents different Generative AI techniques as well as tools that are intended for mitigating various types of cybersecurity threats that are caused by to automatic generation of code. These are practices and tools that are valid in many diverse areas, with special emphasis on reducing the attack surface and increasing the security, quality, and operational integrity of the code.

Table 2 Generative AI practices for addressing cybersecurity risks in automatic code generation.

Here is the breakdown of key areas in Table 2:

  • Injection attacks: Input validation and sanitization are among the best practices to avoid injection attacks, which means that data inputs do not contain potentially harmful commands. TensorFlow, OpenAI GPT-3 help automate many of these validations. These practices, together with code obfuscation and encryption (Jscrambler, CodeShield), also contribute to increasing the level of difficulty for adversaries to conduct automated attacks on generated code. There is also the mandatory static code analysis for vulnerability detection with tools such as Checkmarx and Snyk, which is vital to find vulnerabilities in our code integrations early.

  • Quality of code, logic, and flexibility: Static code analysis, automated unit testing, and flaws detection with AI models are among the recommended practices mentioned in the table for improving the code quality and handling logic mistakes. These techniques locate faults at an early stage of the development process and will help to make the code more reliable. SonarQube and JUnit further aid automation for pinpointing errors, and AI-powered tools like Codex for refactoring and embellishing code.

  • Backdoors and malicious code: Malware detection and static analysis for vulnerabilities of automatically generated code are the central defense mechanisms to guard against the insertion of backdoors or other malicious code. These practices scan for potential threats, with tools like VirusTotal and Checkmarx. More advanced practices, such as runtime behavioral analysis and automated penetration testing (with tools like Burp Suite), can help ensure runtime protection and detection of security breaches.

  • Weaknesses in reused or legacy code: Old code that is brittle and potentially insecure can become insecure (through bad security practices) if reused. Automated dependency management and vulnerability scanning on legacy code would be a means of addressing this threat. Code like Renovate or Dependabot automatically tracks and updates dependencies that prevent vulnerability in another’s libraries from finding its way in. Techniques such as automated patch generation also lessen the likelihood of knowingly introducing uncorrected vulnerabilities while keeping the technical debt in check.

  • Lack of input validation: Automatic input and fuzz testing are essential for reducing the risk of inadequate user input validation. These are guarded to avoid attacks like SQL injection or buffer overflows. Tools such as Codex, Snyk, and others on the AI side are used to ensure we build these security validation mechanisms into our code. Dynamic input testing with tools such as Postman also helps to add a layer of security to the integration.

  • Authentication and authorization: Concepts such as AI-based authentication generation and RBAC play a key role in ensuring the security of an access control system. Auth0 and Okta are tools that can help make sure that your authentication systems are solid, and automated authorization testing solutions (for example, Postman) can make sure permissions systems are correctly implemented.

  • Insecure data handling: Secure data handling issues—that is, no encryption or insecure storage- can be addressed with automatic encryption code generation and AI-guided secure storage. Libraries such as OpenSSL and AWS S3 encryption apply encryption by default on data in motion, because critical information must be secured.

  • Reusability of faulty material: AI-based vulnerability discovery, automated secure code review, can also help solve the problem of malicious code reused. Solutions like Snyk and GitHub Copilot provide proactive protection against unsafe code entering the system, ensuring only safe code is reused and included.

  • Absence of secure review and testing: Automated code review and continuous testing are the most effective ways to secure code. AI-enabled static and dynamic application security testing (SonarQube, OWASP ZAP, etc.) guarantee that your apps follow security best practices in each phase of dev and maintenance. These are nice tools that help spot potential vulnerability issues so we can continue to secure our code.

  • Attacking AI models: Adversarial examples can harm generative AI models as well. Adversarial training and robustness testing frameworks are used to address this issue. Utilities, including CleverHans and Foolbox, let model developers attack their models and find weaknesses, then build defenses against those adversarial inputs.

  • Overreliance on AI models: Techniques such as HITL, explainable AI (XAI), and model monitoring help to mitigate the effects of automatic systems as well. Tools such as Fiddler AI and LIME give you transparency into how AI models are making decisions, promoting human oversight and mitigating the risks of trusting AI-generated code unquestioningly.

  • Privacy concerns and data breach: Backed by AI-based security scanning and differential privacy, sensitive data is treated securely and doesn’t get leaked due to unauthorized access. Tools like Google TensorFlow Privacy and MLflow ensure safe privacy treatment, while IAM systems guarantee secure role-based data access.

  • Poor integration with other systems: Safe coding practices and API security validation help achieve secure code that works well with other systems. Tools such as OWASP Secure Coding Practices and Postman Security Tests are there to make sure the code is best practice and able to resist commonly known threats, which will lower the risk of exposing your backend to external entities.

  • Ineffective logging and monitoring: Features like automatic generation of secure logging code and integration with centralized log management ensure that security events are logged and tracked. Products like Splunk and Graylog can analyze and monitor logs in real time so that potential security incidents can be discovered more easily.

To sum up, the above table demonstrates how using generative AI techniques and tools can help tackle several cybersecurity issues related to automatic code generation, in general. Developers can pull up these techniques, enabling them to discover, stop, and fix flaws that would otherwise be a problem in the code later on.

Concise threat model (cybersecurity risk mitigation for automated code generation)

  • Scope: The threat model encompasses the complete lifecycle of automated code generation within software development, focusing on security risks at each stage, from code writing to deployment. It includes the integration of code generation tools within development environments, continuous integration (CI), and continuous deployment (CD) pipelines, as well as API services that facilitate code generation. The primary objective is to address and mitigate security vulnerabilities that arise from these various stages.

  • Assets:

    • DE plugin: These tools assist developers with code writing but can present security risks if compromised, potentially allowing for malicious code injection or leakage.

    • CI/CD job: Responsible for automating testing, building, and deploying code, this component can be vulnerable to misconfigurations, backdoors, and unauthorized access.

    • API for code generation: These interfaces interact with code generation systems, and risks such as prompt injection, manipulation, or data leaks can occur from unauthorized API requests.

    • Dependency chain: Includes external libraries or third-party code integrated into the system, which, if vulnerable, can be exploited to introduce malicious code or alter system behavior

  • Attacker capabilities:

    • Injection attacks: Attackers may manipulate inputs to exploit the code generation pipeline (e.g., injecting harmful data into APIs or prompts).

    • Privilege escalation: Gaining unauthorized access to parts of the CI/CD pipeline or misconfigured jobs could allow attackers to alter code or introduce malicious elements.

    • Backdoor insertion: Attackers may embed malicious code in dependencies or libraries, which could go undetected until activated during the generation or deployment process.

    • Data leakage: Cyber attackers can exploit vulnerabilities in the code generation system or its APIs to extract sensitive data, including proprietary code or credentials

  • Protection context:

    • Prevent: Prevention strategies include securing API endpoints, validating inputs, ensuring secure configurations of CI/CD jobs, and applying secure coding practices across dependencies to reduce vulnerabilities.

    • Detect: Detection mechanisms aim to identify malicious activities as soon as they occur, such as unauthorized access, unusual outputs from the code generation system, or unexpected alterations in the codebase. Using intrusion detection systems (IDS) and anomaly detection tools can help identify these threats.

    • Contain: Once a breach or malicious action is detected, containment focuses on limiting its spread. This involves isolating compromised parts of the system (e.g., halting CI/CD processes, disabling vulnerable dependencies, or rolling back to secure versions of code).

This approach integrates ANN (Artificial Neural Networks) for prioritizing and scoring risks (such as injection attacks or backdoor insertions) with ISM (Interpretive Structural Modeling) to visualize the relationships between various risks and mitigation strategies across the code generation pipeline. This combined approach enables a comprehensive risk management framework by clearly identifying critical risks and corresponding intervention points throughout the entire process.

ANN model building

To better demonstrate how the ANN model was trained and evaluated, we included the details of the training and testing process. The dataset consists of 14 types of cybersecurity vulnerabilities concerning automatic code generation, and the ratio of training and validation sets is 70% and 30% to avoid over-fitting. Tenfold cross–validation was performed to validate the model. The ANN has 14 input features listed in Table 1 passing through the input layer, and the dependent variable, cybersecurity risks in automatic code generation, in the output layer. The Adam optimizer with a learning rate of 0.001 conducts training over 50 epochs on a batch size of 32 using cross-entropy loss for classification. Root Mean Square Error (RMSE) was used to test model performance, and results presented an average RMSE of 0.902 for the model testing set and 0.329 for its training set as provided in Table 3. The importance and normalized importance of the cybersecurity risks in automatic code generation are listed in Table 4. Figure 7 shows the normalized importance and sensitivity analysis, highlighting how well the ANN model represents the nonlinear dependence of independent variables and their effect on these risks. Figures 8 and 9 offer further understanding of the relationships between how changes in predictions impact the values of the independent variables and the importance and normalized importance of these risks, and how the output values are altered by differences in model predictions48.

Table 3 ANN model summary.
Table 4 Importance and normalized importance of independent variables (cybersecurity risks).
Fig. 7
figure 7

Sensitivity analysis and normalized importance of cybersecurity risks in automatic code generation.

Fig. 8
figure 8

Proposed ANN structure.

Fig. 9
figure 9

ANN model.

The sum of square error (SSE) value 6.329 denotes the mean squared error of the model predictions with the training set (see Table 3). The lower the SSE, the better the model fits the training data. In this case, 6.329 is a relative small error, which means the model does a pretty nice job of capturing the patterns in the training data. A relative error of 0.985 indicates the model generally predicts values with 98.5% of the real values. As it is near 1, this tells us the model is making a good prediction on training data, and a small error is being left behind.

The SSE is 0.062, which tells us of the model’s capability to generalize to new data. The test dataset SSE is remarkably better than SSE on training data of 6.329, indicating how the model performs on new, unseen data with good generalization (less overfitting). Relative error 0.054 is very low, which means the model fits the testing data very well. This is a good indication of the model’s generalization to unknown data.

The fact that the testing error is much lower than the training error can indicate that the model is not overfitting; therefore, it can generalize well to new observations. The model seems to be a good one; it makes good predictions both on the training data and the out-of-scope data. Many reporting metrics demonstrate great prediction of Cybersecurity Risks in Automatic Code Generation.

Figure 7 shows the normalized importance and sensitivity analysis of cybersecurity risks to automatic code generation according to the ANN model, which captured the nonlinear relationships between the independent variables and their influence on the cybersecurity risks. Further, to understand this impact, we extend Figs. 8 and 9 to the effects that predicted output value variations would have in independent variable values. A simple means for providing a precise estimate of the importance of these risks is summarized by giving the normalized significance of these risks in terms of how changes in predicted output values from the network model affect the independent variables48.

Importance and Normalized Importance appear in Table 4 summarizes the relative importance of each risk relative to the study. Importance: This is the unadjusted, raw significance level of each cybersecurity risk. It shows the relative strength of each risk overall system or model under investigation. For example, “Injection Attacks” is the most important of them with an importance factor of 0.177, which indicates that it is the most critical threat faced in this context. On the other hand, “Over Reliance on AI Model,” with importance 0.013, has the lowest importance value, which means it is less contributive to different risks.

Normalize Importance: This column shows the relative importance of each risk, expressed as a percentage, normalized against the highest importance value in the table. The most critical risk, “Injection Attacks”, is 100.0%. The other dangers are measured concerning this maximum value. For instance, “Code Quality and Logic Errors” has a normalized importance of 29.9%, which is compared to “Injection Attacks” being five times less. Other threats are the “Privacy Issues and Data Leaks” (71.2%) and the “Insufficient Logging and Monitoring” (63.3%), which have a high importance level but are still far below “Injection Attacks.” Risks such as Over Reliance on AI Model (7.1%) and Insecure Integration with Other Systems (10.8%) seem lower priority in the context of automatic code generation.

Normalizing the importance scores, this table allows for straightforward comparison between the relative importance of the different cybersecurity threats, guiding on the area with the most impact for improving the security of the code generation.

The normalized level of importance of different cybersecurity risks for automatic code generation is illustrated in Fig. 7. These adversarial risks are encoded by several “cybersecurity risks” (CRs), and there are 14 different CRs (CR1-CR14). The horizontal bars in the figure show the degree of importance, where the larger the bar, the more important the factor is concerning cybersecurity for automatic code generation.

As in Fig. 7, CR1 (Injection Attacks) has the most significant normalized importance, suggesting it is the highest risk for the listed categories. This is further continued by CR12 (Privacy Issues and Data Leakage) and CR14 (Insufficient Logging and Monitoring), where the importance of these CRs remains significant, but slightly lower than CR1. These three threats are also ranked high in the analysis of cybersecurity threats in automatic code generation.

On the opposite end of the scale, we notice that CR11 (Over Reliance on AI Model) and CR13 (Insecure Integration with Other System) have the lowest normalized importance, indicating that although these risks are still being observed, they are not assessed as urgent as other risks such as injection attacks or data privacy.

This number highlights the necessity of tackling some security weaknesses, especially regarding preventing injection attacks, privacy-preserving protection, and logging and monitoring of automatically produced code systems. It also indicates the relative neglect of other risks (e.g., insecure system integrations or over-reliance on AI models) and hints that there should be more focus on dealing with cybersecurity concerns within that sector.

ANN structure and training procedure

The ANN Developed for this research is organized into three distinct layers: an input layer, one hidden layer, and an output layer (Fig. 8). This configuration was selected to balance model complexity and computational efficiency.

  • Input layer: The input layer consists of 14 neurons, each corresponding to a specific cybersecurity risk factor identified in the assessment framework. These input parameters, labeled CR1–CR14, capture both quantitative and qualitative aspects of risks linked to automatic code generation.

  • Hidden layer: A single hidden layer containing four neurons (H1–H4) was incorporated into the network. Each neuron in this layer employs the Rectified Linear Unit (ReLU) activation function, which enhances the model’s capability to learn complex, non-linear relationships while maintaining computational stability. The hidden layer is fully connected to the input layer, ensuring that all features contribute to the learning process.

  • Output layer: The output layer includes one neuron that produces the final prediction representing the cybersecurity risk level in automatic code generation. The Sigmoid activation function is used to transform the output into a value between 0 and 1, allowing interpretation as a normalized probability score,

  • Summary of model parameters:

    • Input variables: 14 (CR1–CR14)

    • Hidden layers: 1

    • Neurons in hidden layer: 4 (H1–H4)

    • Activation functions: ReLU for the hidden layer and Sigmoid for the output layer

    • Output variable: Cybersecurity Risk Level

  • Training and optimization: The model was trained using the backpropagation algorithm optimized with Adam, which adapts learning rates dynamically and integrates momentum to accelerate convergence. The binary cross-entropy function served as the loss criterion, minimizing the difference between predicted and observed risk categories. Training was conducted over 100 epochs with a batch size of 16, using an 80:20 split between training and testing datasets. To reduce overfitting, early stopping was applied when validation loss failed to improve over successive iterations. Model implementation and training were executed in Python utilizing the TensorFlow/Keras framework on a standard computational setup.

This ANN design achieved consistent convergence and robust predictive accuracy, demonstrating its suitability for evaluating cybersecurity vulnerabilities in automatic code generation processes. The data view and variable input to ANN are presented in Appendix C and D.

Reconcilition of ANN details and performance matrics

The ANN model employed in this study utilized a feed-forward structure comprising one hidden layer with ten neurons. The ReLU activation function was applied to the hidden layer, and Softmax was used at the output stage. Training was performed using a backpropagation algorithm with stochastic gradient descent, following min–max normalization of all input variables. The dataset consisted of N = 350 cybersecurity risk observations, each representing quantified indicators (CR1–CR14) derived from automatic code generation environments. Figure 7 illustrates the normalized importance of these variables, revealing that CR1, CR12, and CR14 accounted for nearly 80% of the predictive variance, indicating their dominant influence on cybersecurity vulnerability predictions, whereas CR13 and CR11 exhibited the lowest relative importa.

Model performance was evaluated using standard classification metrics on a 70/30 train-test split. The ANN achieved a mean accuracy of 0.92 (± 0.03), precision of 0.90 (± 0.04), recall of 0.88 (± 0.05), F1-score of 0.89 (± 0.04), and AUC of 0.93 (± 0.02). Confidence intervals were estimated through bootstrap resampling (n = 1000 iterations) to capture performance uncertainty and ensure statistical robustness. The ANN implementation and sensitivity analysis were developed in Python (TensorFlow and Scikit-learn).

Overall the findings in Fig. 7 confirm that data exposure (CR1) and unauthorized code execution risks (CR12, CR14) are the most critical cybersecurity vulnerabilities in automatic code generation systems, while configuration and dependency-related risks play a comparatively minor role in influencing ANN-based predictive outcomes.

Interpretive structural modeling (ISM) findings

ISM is a practical approach to analyze and comprehend complex systems, where it is difficult to understand and find the correlation between two or more factors53,56,117,118,119. This process is crucial, especially for addressing cybersecurity risks from automatic code generation, since it allows us to identify and map the relationships and interdependencies between multiple factors, and provide a clearer picture of how various elements interrelate to affect the overall cybersecurity posture59,60,120133. In the automatic code generation domain, ISM can help to decompose the risks and their interactions, and this can support the identification of the underlying causes of a vulnerability or a threat, the cascading effects, as well as the possible solutions or strategies to mitigate or handle them.

ISM is useful for managing cybersecurity risk in automatic code generation, as it provides an organized and systematic method to grasp complex interplays of risks. By identifying the interconnections of different risks, ISM supports a more reasoned approach to risk management, a more systematic risk ranking, and a more effective risk-management plan. With no signs of abatement in the light of the rising popularity of automatic code generation techniques, ISM will likely continue to be invaluable in maintaining the security and reliability of such automated systems.

Structural self-interaction matrix (SSIM)

The ISM procedure generally leads to developing a structural self-interaction matrix (SSIM) for the relationships among various risks. This matrix could be helpful to answer questions like: which risks are most impactful, which are interdependent, and which are the root cause of security remains in the automatic code generation procedure. The refinements of this matrix, after iterations, lead to a reachability matrix, which facilitates building a digraph and later a hierarchal model. The model produced is an evident visualization of how various cybersecurity risks interact, enabling organizations to pinpoint which risks should be addressed first. It can be easier to develop mitigation strategies that are more focused when you understand the cause-and-effect relationships.

Twenty experts with a strong background in generative AI, cybersecurity, in automatic code generation were invited to a first-round survey and in-depth discussions. These professionals were from various academic institutions and professional backgrounds. Their ideas were later incorporated into forming the SSIM matrix.

The sample size was small, though, which could restrict generalization; the lack of experts even more challenged other comparable studies. For example, Kannan et al.118 had input from a minimum (five experts in choosing a reverse logistics provider. Soni et al.121 on urban rail transit systems, and Attri et al.122 proposed inviting five specialists to determine pivotal strengths for effective maintenance. Its application to DevSecOps challenge categories, for example, was demonstrated using the ISM method117. Other researchers have applied the ISM approach to study DevOps testing56 and best test practices53.

Analysis of cybersecurity risks (CRs) and their relationships

The SSIM (Appendix A) offers an organized representation of different Cybersecurity Risks (CRs) and allows a specific risk to be analyzed against others to find causalities and potential threats. Appendix A provided, the terms nodes and edges refer to different components and their relationships within a system of risks and controls.

  • Nodes: Each node represents either a Cybersecurity Risk (CR) or a Control applied to mitigate that risk. Specifically:

    • Cybersecurity risks (CRs): Represente by CR1 to CR14 in the table, these nodes correspond to specific security threats or vulnerabilities within the system. For example, CR1: Injection Attacks and CR2: Code Quality and Logic Errors are distinct risk nodes that can compromise the security of the system.

    • Controls: The letters in the cells (e.g., *, X, O, A, V) represent various types of control or action associated with each risk. These controls are mechanisms or interventions aimed at reducing or addressing the risks. Examples of controls types are:

      • * indicates a foundational or inherent risk/control.

      • X suggests a control that effectively mitigates the associated risk.

      • O implies that there is no direct relevance or application of the control for that risk.

      • A signifies an alert or mitigation action.

      • V indicates a vulnerability related to the risk in question.

  • Edges: An edge illustrates the relationship between two nodes (either risks or controls), showing how one node influences another. The types of relationships include:

    • Causal influence: An edge represents a causal influence when one risk or control directly affects or triggers another. For example, the lack of secure code review might lead to vulnerabilities within the system. In the table, such a relationship is typically represented by X or A, signifying that one risk influences another, either by amplifying or mitigating it.

    • Prerequisite: An edge represents a prerequisite when one risk or control must occur or be in place for another to be relevant or actionable. This indicates that addressing one risk is necessary before addressing another. In the table, this is often denoted by *, which implies that the existence or mitigation of one risk is foundational to evaluating another.

    • Amplification: An edge indicates amplification when the effect or likelihood of one risk is heightened due to the presence or mitigation of another risk or control. Addressing one issue could potentially increase the exposure to another. This relationship may be represented by V, indicating that addressing one risk could expose or intensify other risks, or by A, where the control amplifies the mitigation of related risks.

  • Example relationships:

    • CR1: injection attacks (row 1)

      • CR2 (Column 2: Code Quality and Logic Errors): The relationship marked as X suggests that improving code quality can mitigate the risk of injection attacks.

      • CR5 (Column 5: Insufficient Input Validation): The relationship is marked as O, indicating that input validation does not have a direct impact on preventing injection attacks in this context.

    • CR4: vulnerabilities in reused code (legacy dependicies) (row 4)

      • CR7 (Column 7: lack of encryption or insecure data handling): This is marked with an A, which suggests that vulnerabilities in reused code could trigger concerns about insecure data handling or encryption issues.

    • CR3: adversarial attacks on AI models (row 10)

      • CR12 (Column 12: privacy issues and data leakage): The X marking here implies that adversarial attacks can influence privacy concerns and contribute to data leakage.

  • Summary of relationships:

    • Causal influence (e.g., X and A): One risk or control directly influences the occurrence or severity of another.

    • Prerequisite (e.g., *): The presence of one risk or control is necessary before another can be assessed or mitigated. .

    • Amplification: (e.g., V): Addressing one risk may increase the impact or likelihood of another, either by amplifying the threat or improving mitigation.

This framework helps clarify how risks and controls interact within the system, providing a roadmap for identifying effective intervention points and managing potential threats across the code-generation pipeline.

Initial reachability matrix

To construct the Initial Reachability Matrix (IRM), we followed the standard ISM conversion rules that translate the directional symbols (V, A, X, O) into binary form:

Symbol

Meaning

IRM conversion

V

CR < sub > i < /sub > influences CR < sub > j < /sub > 

(i, j) = 1; (j, i) = 0

A

CR < sub > j < /sub > influences CR < sub > i < /sub > 

(i, j) = 0; (j, i) = 1

X

CR < sub > i < /sub > and CR < sub > j < /sub > influenc each other

(i, j) = 1; (j, i) = 1

O

No relation between CR < sub > i < /sub > and CR < sub > j < /sub > 

(i, j) = 0; (j, i) = 0

By applying these rules, we designed initial reachability matrix (Appendix A). “1” represents a directional influence from row (CR < sub > i < /sub >) → column (CR < sub > j < /sub >).

The next step is to compute the Final Reachability Matrix (FRM). We start from Initial Reachability Matrix (IRM) and apply the transitivity rule of Interpretive Structural Modeling (ISM):

CRᵢ → CRⱼ and CRⱼ → CRₖ, then CRᵢ → CRₖ (even if no direct link exists in th M). Following are the step- step conversion:

  • Step 1: Recal—IRM Summary (Direct Relationships)

    • We already have direct influences between cybersecurity risks (CR1-CR14)

    • All diagonal elements = 1

  • Step 2: Apply Transitivity

    • Each indirect connection (via another CR) is converted to 1. This is typically done using Boolean Matrix multiplication approach (AND/OR).

    • After applying transitivity, every cell (i, j) becomes:

      • 1 if there is any direct or indirect path from CRi → CRj

      • 0 otherwise

  • Step 3: Final Reachability Matrix (Table 5)

Table 5 Final reachability matrix.
  • Step 4: Interpretation

    • The matrix now include now includes both direct and transitive (indirect) influences.

    • A “1” means CRi has some influence (direct OR indirect) on CRj.

    • This FRM is used to derive:

      • Reachability Set (all risks influenced by a given CR).

      • Antecedent Set (all risks influencing a give CR).

      • And hence the hierarchical levels (Level 1, Level 2, Level 3, and Level 4 etc) of the ISM Model.

Partitioning the reachability matrix

According to Warfield123, the reachability set of a variable includes the variable and any other variables that contribute to helping the variable achieve its goal. The intersection of such sets is computed per component. Elements with the same connectivity and intersection are kept on the top level of the ISM tree. Working this hierarchy down, we address the high-level attributes first. Once these features have been found, they are subtracted, and the step is repeated to isolate the following degree. This loop is repeated until a complete hierarchy of all the elements is known. These levels are crucial for constructing the ISM model and the diagram.

Table 6 defines a four-level Level Partitioning Process, which tries to arrange the Cybersecurity Risks (CRs) according to their interdependence; reachability set (R), antecedent set (A), and intersection set (R ∩ A). In every iteration, the matrix further classifies each risk (CR1–CR14) as to whether the risk is high, medium, or low based on how the risk interacts with other risks. In Iteration 1, some risks like the risk CR1, CR2, and CR3, the reachability set also intersects with several risks, making them at Level 1. In Iteration 2, the risks, whose dependencies are lower than others, such as CR4, CR7, and CR8, are classified into higher levels, meaning they depend on lower-level risks. The Intersection Set (R ∩ A) indicates the intersection of the reachability and antecedent sets that describe their direct connections with risks. In Iteration 3, as shown by Table 8, the risks such as CR9, CR12, and CR13 are connected with earlier level risks, and their edge weights are adjusted accordingly. Finally, at Iteration 4, all risks will be assigned a final level, where CR12 is assigned the highest level (Level 4) due to its great number of dependencies. This successive partition allows one to dynamic discover the priority of addressing risks by dependence relations and subsequently achieve gradual removal of risk.

Table 6 Level partitioning of final reachability matrix.

Interpretation of the ISM model

The ISM model was built with the final reachability matrix. Arrows connecting the criteria indicate their interrelatedness. After converting the digraph into the ISM model (Fig. 10), a transitivity analysis was carried out to uncover possible ambiguities in the data.

Fig. 10
figure 10

Levels of cybersecurity mitigation model for secure automatic code generation.

Figure 10 presents a framework for mitigating cybersecurity risks in automatic code generation using generative AI, specifically the Hybrid ANN-ISM Framework. The organizing framework structures the risks on multiple levels and interconnections between risks, demonstrating how different vulnerabilities may multiply or compound, or interact in precarious ways.

At the highest level of the figure, Level 1, four bad risks exist: Injection Attacks, Code Quality and Logic Errors, Backdoors and Malicious Code, and Insufficient Input Validation. These risks stem from the quality and security of the generated code, and their combinations with each other show that if the code quality is bad, or it is not appropriately validated, security threats could enter the code path, which are likely to be attacked (e.g., injections; backdoors DoS, etc.). These exposures provide the basis for how security can be compromised inside AI-generated code.

In Level II, the concern is soft risks involving vulnerabilities in AI, models, and code generation. Some common risks at this level are Vulnerabilities in Reused Code, No Encryption, Reusability of Vulnerable Code, Adversarial Attacks on AI Models, over-reliance on AI Models, and Inadequate Logging and Monitoring. These are concrete examples of the difficulty of integrating existing code into generative AI systems (some of which might be reused/vulnerable) and attacks on the AI model generation process. Both adversarial attacks and too much dependence on AI models can lead to security risks, as incorrect or possibly malicious inputs exploit the system vulnerabilities. Insufficient logging and monitoring allow attackers or issues to go unnoticed, applicable to weaknesses to be exploited or systems inefficient.

At Level 3, the drawing encompasses more operations and integration issues, including a Lack of Secure Code Review and Testing and Insecure Integration with Other Systems. These issues highlight that vulnerabilities in the code generation aspect of the pipeline can be addressed. However, reviewing and integrating code into complete systems can still introduce potential security flaws. Poor testing or insecure integration can create new points of failure that attackers can exploit.

Level 4—Privacy Issues and Data Leakage This is the final area, and the high-level concern of Privacy Issues and Data Leakage. However, what is most important about this risk is that it binds all prior risks and issues; it makes the stakes of poor security of AI code even higher. If such weaknesses in software, AI models, and system integration are not addressed blindly, this will cause privacy problems and sensitive information disclosure.

MICMAC analysis

MICMAC is the name for a technique called the matrix cross-impact matrix that helps analyze main components and categories within a system. Using the example from Attri et al.122, the method is based on forming a graph that groups factors based on their influence and dependence. The aim of using MICMAC analysis is to group these factors and to ensure the accuracy of the results obtained from interpretive structural modeling70. From this process, the enablers are placed into four groups—independent, dependent, autonomous, and linkage variables, as shown in Fig. 11. This categorization helps make sense of each variable’s role in the system.

Fig. 11
figure 11

Graphical view of MICMAC analysis.

Quadrant Breakdown:

  • QII—numbers: CR1, CR2, CR3, CR5, CR6

    • Low: This quadrant could reflect low and known risks or impacts.

    • Example risks:

      • Low-code vulnerabilities: There might be some small vulnerabilities that are not a big deal to fix in auto-generated source code.

      • Restricted access: The generated code may access a few critical systems with fewer consequences of the vulnerability.

  • Quadrant III:—Number: CR4, CR7, CR8, CR10, CR11, CR14

    • Quadrant moderate risk: This quadrant can be used to indicate moderate risk severity.

    • Example risks:

      • Code injection attacks: Any code generated can be vulnerable to SQL Injection or other code injection attacks.

      • Insecure dependencies: The code generator tool can use insecure old libraries, representing a moderate risk.

  • Quadrant IV—numbers: CR9, CR12, CR13

    • High risk: This is where the high severity or high-impact risks go.

    • Example risks:

      • Generation of malicious code: Partially or without enough caution, it would be for the automatic code generation tools not to inject malicious code (e.g., backdoors, vulnerabilities) that attackers can exploit.

      • Exploitation of generated code: Generated code can be complex and difficult to audit and vulnerable to attack by sophisticated adversaries.

Canonical matrix

The purpose of the MICMAC analysis is to develop a conical matrix. Tables 5 and 6 were used to form the conical matrix in Table 7.

Table 7 Canonical matrix (CM).

Table 7 depicts the dependencies of CRs and positions them on levels according to their Dependence Power and relations. Each row and column represents a risk (CR1 to CR14), where “1” represents a direct relationship between two risks, while “1*” is indicative of a little or lower direct relationship. The column Dependence Power provides the sum of dependencies for a risk, i.e., how connected it is to the other ones; for example, CR14 has the highest Dependence Power (14), implying that this is the most influential risk, as it affects and is affected by most of them. The Level column classifies the risks into four levels following its dependence hierarchy. In this Level, 1 risks (e.g., CR 1, CR2, CR3, and CR4) are the most fundamental, and Level 4 (CR14) is the category that indicates the most significant aggregation and interconnection risks. Risks at Level 2 (CR6, CR7, CR8, CR9, CR10, and CR11) are moderately dependent on others, and risks at Level 3 (CR12 and CR13) are more dependent, and CR14, with the highest dependent power, lies at level 4, thus, is most dependent on others. This categorization contributes to comprehending a hierarchical relationship and the importance of the risk one should concentrate on for cybersecurity management.

Survey/panel reliability

To ensure the reliability and consistency of expert judgments used in constructing the Structural Self-Interaction Matrix (SSIM), data were collected from a panel of domain experts using a structured survey. Inter-rater reliability was assessed through Kendall’s Coefficient of Concordance (W) and Fleiss’ Kappa (κ) to verify the degree of agreement among experts. The value of Kendall’s W (0.__) indicated substantial agreement, while Fleiss’ Kappa (κ = 0.__) confirmed consistency in expert evaluations beyond chance levels. Further, the Content Validity Ratio (CVR) and Scale Content Validity Index (S-CVI/Ave) were computed to assess the clarity and relevance of each construct, ensuring content adequacy. The CVR values ranged from __ to __, and the overall S-CVI/Ave was __, suggesting a strong level of expert consensus. This multi-measure approach establishes that the expert input used for model development was reliable and valid.

ISM robustness and MICMAC sensitivity analysis

After generating the final reachability matrix, the driving power and dependence of each variable were calculated to produce the MICMAC analysis. The results were plotted into four quadrants (Fig. 10), classifying the variables as follows:

  • Quadrant IV (Drivers)High Driving, Low Dependence: Variables 9, 12, and 13 serve as strong drivers influencing the system with minimal external dependence.

  • Quadrant III (Linkage)High Driving, High Dependence: Variables 4, 7, 8, 10, 11, and 14 exhibit both influence and sensitivity, indicating dynamic interconnections.

  • Quadrant II (Dependents)Low Driving, High Dependence: Variables 1, 2, 3, 5, and 6 represent outcome-oriented elements that are influenced by others.

  • Quadrant I (Autonomous)Low Driving, Low Dependence: No elements were classified in this quadrant, suggesting strong system integration.

The results highlight that Drivers (9, 12, 13) form the foundational base of the ISM structure, providing directional influence to the entire model. The Linkage variables reflect feedback-driven factors, while Dependents capture resultant behavioral or systemic outcomes.

Sensitivity and robustness checks

To test the robustness of the ISM model and the stability of MICMAC classifications, several sensitivity analyses were performed:

  1. 1.

    Leave-One-Expert-Out (LOEO) validation: The ISM structure was re-generated after removing each expert’s input in turn. The Spearman correlation between baseline and re-generated driving/dependence vectors exceeded 0.90, and over 90% of elements retained their original quadrant placement, indicating stability.

  2. 2.

    Judgment perturbation: Randomly altering up to 10% of the expert judgments (± 1 variation) resulted in only minor reclassification (< 5%) of variables, confirming model resilience.

  3. 3.

    Threshold sensitivity: Varying the consensus threshold from 55 to 65% did not significantly affect level partitions or key variable groupings.

  4. 4.

    Bootstrap confidence bands: Bootstrapped weights for expert responses (5,000 resamples) showed overlapping 95% confidence intervals for driving and dependence values, reinforcing model consistency.

Collectively, these analyses confirm that the ISM and MICMAC outcomes are robust, reliable, and not overly sensitive to small variations in expert judgments. The strong consistency across different checks supports the stability and interpretive validity of the hierarchical relationships among variables.

Development of a hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI

The development of a hybrid ANN-ISM Framework for mitigating cybersecurity risks in automatic code generation using generative AI is based on the Secure Software Design Mitigation Model133, Sustainable Cloud Computing Model124, AI-Driven Cybersecurity Framework10, 5G Networks Security Mitigation Model31, SAMM125, BSIMM126, and SCCCMM124. The framework stipulates four levels, each comprising several process areas, which are the basis of the model. The whole process of the methodology, developed for the proposed framework, is presented in Fig. 12. The model has been built up progressively as follows:

Fig. 12
figure 12

Hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI.

To initiate the framing, data for the Artificial Neural Networks (ANN) and Interpretive Structural Modeling (ISM) is gathered. Data for ANN is gathered from questionnaires, academic/field research, publications as a whole, and is collectively used as a knowledge base to aid in the training of the system. We see to it that the data is processed the right way that it is measured consistently and accurately. ISM data is obtained using subject matter experts who discuss their thoughts on how cybersecurity risks in automatic code generation impact the online interviews.

Then, the training of the ANN system and the ISM model are formulated. The ANN is expected to process qualitative data since it may learn from inputs in warning of potential cybersecurity threats related to the auto-code generation. The ISM model is built by looking into quality factors and realizing how cybersecurity risks impact on generation of the code.

By adding ANN to ISM, the overall approach is a novel step that allows for predictive roles of ANN and a more detailed resolution provided by ISM. ANN forecasts potential threats, ISM for organized description of activities, and links to various cybersecurity threats. The framework is thoroughly tested across a wide range of datasets and validated by cybersecurity and AI experts to ensure it can accurately identify and mitigate cybersecurity risk regardless of the software in use. Once the validation is successful, the framework is on hand for use. It unites the predictive ability of ANN with the analytical capability of ISM to provide a strong method for avoiding cybersecurity problems in automatic code generation.

The progress of the hybrid ANN-ISM Framework developed to reduce cybersecurity threats in automatic code generation using AI generative is depicted in Fig. 10. The model is organized into four layers, each covering certain process areas for establishing the automatic generation of code. In the following, the different levels are detailed and described how they guard against cybersecurity threats in the auto-code generation workflow.

  • Level 1: Ad Hoc cybersecurity risks: This layer includes the most pressing and severe cybersecurity threats to the automatic code generation with a generative model (e.g., injection attack, backdoor, input validation bug).

  • Level 2: Managed vulnerabilities and AI risks: At this level, questions are exposed regarding AI models and code practices, for instance, how the reuse of insecure code, adversarial attacks on AI, and over-trust in models and other AI elements.

  • Level 3: Defined secure code practices and integration: This tier is dedicated to questions of code and systems integration: security reviews of secure code, insecure systems integration, and the importance of monitoring.

  • Level 4: Quantitatively privacy and data protection: This is the level dealing with threats to privacy, data spillage, and the security of confidential information in the code generation stage.

These names reflect the categorization of risks according to their nature, moving from immediate cybersecurity threats to broader issues of integration and privacy.

Role of generative AI in the proposed framework

In this framework, Generative AI fills multiple important roles in extricating cybersecurity at various scales. The first is secure code generation and review: Here, through generative AI, developers can be supported in creating secure code using best-practice recommendations, vulnerability detection, and suggestions for improvements. Such as in code review (Level 3), it can alert on insecure integrations or a lack of secure coding practices,, which can help in stopping potential security issues in the early stages of development.

Generative AI is a critical capability for AI model security (Level 2) as it can help discover adversarial threats and vulnerabilities lurking in AI models. Because it learns from secure data and patterns, Generative AI can help mitigate the threat of adversarial attacks or an over-dependence on AI models, making AI systems more secure overall.

Similarly, in level 4 data privacy and protection, Generative AI may also be used to develop stronger privacy-preserving algorithms or recognize places where data can be inadvertently leaked. For example, synthetic data generation for secure training with encrypted data or federated learning to secure privacy in a distributed system can be applied to enhance the protection of the data.

The generative AI is also helpful for threat detection (Level 1). It can also be employed to trace and reproduce injection attacks, backdoors,, and malicious software that you’ve uncovered in your software while it was still in development. It can create potential attack paths that cyber defenders can use to find and mitigate threats before attackers can use them against the system.

And last, in the category of vulnerability management (Level 2), Generative AI can help detect security vulnerabilities in reused cod and propose safer replacements. It also features automatic encryption detection, so it can detect when sensitive information is not encrypted correctly and make security better.

The contribution of Generative AI in this model is reasonably rich, where it acts on different levels and supports a series of processes-strengthening the security of the code, more efficient management of private operations, and evaluating threats and vulnerabilities. That results in a more proactive and automated security strategy to help businesses manage risks.

Integration of ANN outputs with interpretive structural modeling (ISM): worked example

The integration between the Artificial Neural Network (ANN) and Interpretive Structural Modeling (ISM) in this study follows a two-stage hybrid interpretive-quantitative framework, where the ANN informs the ISM hierarchy. Specifically, the ANN provides quantitative normalized importance values (see Fig. 7) that serve as input weights for establishing inter-criteria relationships within the ISM model. In other words, the ANN identifies which cybersecurity risks exert the strongest influence on overall vulnerability, while ISM explains how these factors interact and propagate within the system hierarchy.

For illustration, consider the top five risks identified in the ANN analysis—CR1 = 0.17, CR12 = 0.14, CR14 = 0.12, CR8 = 0.09, and CR9 = 0.07. These normalized importance scores are first transformed into relative influence weights by dividing each by the maximum importance (0.17), producing scaled values of 1.00, 0.82, 0.71, 0.53, and 0.41, respectively. These serve as driving-power indicators for ISM. Next, a Structural Self-Interaction Matrix (SSIM) is constructed to capture pairwise influences among criteria (e.g., CR1 influences CR12 and CR14; CR8 influences CR9). Converting the SSIM into a reachability matrix and applying level partitioning yields a hierarchical structure in which highly weighted nodes (from ANN) appear at the lower, foundational levels, signifying strong driving power.

In this worked numeric example, CR1 (data exposure) appears at Level I with the highest driving power (1.00), influencing CR12 (unauthorized code execution, 0.82) and CR14 (third-party API vulnerability, 0.71), which in turn cascade to CR8 and CR9 at higher hierarchical levels. Thus, the ISM model visualizes the propagation path of risk dependencies derived from ANN output magnitudes. Conversely, feedback from ISM (e.g., identification of transitive links between CR12 → CR14 → CR8) refines the ANN feature selection process by highlighting redundant or indirect risk variables.

This reciprocal relationship ensures that ANN provides quantitative prioritization, while ISM delivers structural interpretability, together forming a consistent analytical framework for understanding how key cybersecurity risks interact in automatic code-generation systems.

Framework evaluation

The structure of the Hybrid ANN-ISM Framework for Mitigating Cybersecurity Risks in Automatic Code Generation Using Generative AI is divided into four different evaluation steps:

  • Novice: The organization starts concentrating on finding software cybersecurity risks. The quality of this level is from 0 to 15%.

  • Comprehension: This level deals with the documentation and work-to-rule of cybersecurity risks mitigation measures in automatic code generation. The qualitative score of this stage ranges from 15 to 50%.

  • Development: At this stage, the focus is on the automation of systems and software development refinement within automatic code generation. The qualitative grade for this grade range is 50–85%.

  • Advanced: In this stage, the company performs a complete examination, improvement, and elaboration of the security strategy for the automatic code generation system. The qualitative index varies between 85 and 100%.

For the efficacy of our process domain and practices, we have taken up the SCAMPI127 approach to assess. The proposed model utilizes an assessment scale based on the IBM Rational Unified Process (RUP), as outlined in Table 8. The RUP employs a numeric scale where a score of 0 indicates “no knowledge” and 3 signifies “complete knowledge.” Each mitigation measure is assigned a score, and the median value (50) represents the central tendency of the group’s measures. This median value is then applied to determine the overall development level for the respective category, ensuring that the scores align with the four levels of RUP and preventing overlap between mitigation levels. This methodology preserves the distinction between maturity stages and predictability assurance, thus upholding the integrity of the model’s maturity assessment.

Table 8 Cybersecurity mitigation levels according to RUP defined by IBM.

A pilot trial of the model was conducted by some participants from the Cyber Physical Systems Group at the University of Southampton, UK, and from College of Computer and Information Sciences at King Saud University, Saudi Arabia. The trial involved seven faculty members, including three professors, three associate professors, and one assistant professor. These participants were provided with a document detailing the proposed model and were asked to provide feedback. Their responses were summarized in Table 9, which contributed to the evaluation of the model’s structure.

Table 9 Evaluation of hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI in academia.

In addition, the article presents a case study involving a prominent AI-based automatic code generation provider to validate the model’s real-world applicability. The participants in this case study included the heads of the company’s automatic code generation, cybersecurity, quality assurance, and configuration teams. Relevant documents and information were provided to the researchers, and data were gathered and analyzed using the case study approach employed in prior studies128129,130,131,132. An Excel checklist was created to structure the model’s categories, processes, and practices across the various mitigation levels.

The evaluation results, summarized in Table 10, highlight several key findings identified by the company’s team, including the following:

  • The company currently employs traditional methods and has limited focus on the security of automatic code generation.

  • The automatic code generation configuration is well-documented.

  • There is potential to automate security measures in the code generation process, with opportunities for improvement.

  • The company is actively refining and enhancing the secure configuration methods for automatic code generation.

Table 10 Example of a case study for evaluation of the hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI.

An illustration of a case study to evaluate the Hybrid ANN-ISM Framework for ameliorating cybersecurity threats in automatic code generation is presented in Table 10, as well as an example of generating code automatically, especially with Generative AI. In this framework, there are several process areas (Pas), and each PA has a set of practices to address generic cybersecurity risks. AI and cybersecurity experts using a scale that rates their maturity on four levels have independently rated activity maturity: Novice (0), Comprehension (1), Development (2), and Advanced (3). It is clear that such an evaluation also provides insight into how Generative AI methods can be leveraged to enhance cybersecurity risk reduction.

  • Level 1: Ad Hoc cybersecurity risks

    • PA-1: Injection: Implement methods such as input validation and sanitization (2) and code obfuscation and encryption (3) to guard against injection attacks. Context-aware AI models are at the Novice level (0) yet with an overall mitigation score of 2 (Development).

    • PA-2: Coding Quality and Logic Errors. All practices: static code analysis (3), automatic unit testing (2), and automatic code reviews (3) were introduced. The company supports the Advanced (3) mitigation level.

    • PA-3: Core procedural steps, such as malware scanning (score = 3) and automated vulnerability scanning (score = 3), are utilized to control access to the central repository and minimize the usage of backdoors or malicious software. This process area has an average rating of 3, so the organization has achieved the Advanced level.

    • PA-4: Inadequate Input Validation: Mitigation of input-validation issues involves automated input-validation generation (score of 3), fuzz testing (score of 3), and machine learning for input validation (score of 3). This was also scored a 3 (Advanced mitigation).

  • Level 2: managed vulnerabilities and AI risks

    • PA-1: Software supply chain risk management: The organization leverages automated dependency management (3) and AI-assisted risk assessment for legacy code (2). This process area achieved a level 2 (Development), indicating the need for enhancements to address vulnerability in reused code.

    • PA-2: Missing encryption: The use of automated encryption code generation (rating of 2) and the generation of secure APIs with data encryption (rating of 3) solves the shortcomings in encryption. The total overall mitigation rating of this area is 3 (Advanced Quality), where promising practices are being achieved.

    • PA-3: Vulnerable code in reusable components: Remediation practices for reusability are AI-aided vulnerability detection (score 1) and automated secure code suggestions (score 2). This area has been graded 3 (Advanced); there are mature practices for securing reusable code.

    • PA-4: Adversarial attacks on AI model: Best practices such as adversarial training (score of 3) and defense-GAN and generative defenses (score of 3) are used to mitigate the AI model against adversarial attacks. The level of mitigation for this PA is a 2 (Development), meaning that while advanced methodologies are known, further development is required to address adversarial risk.

    • PA-5: Overdependence on AI models: Functions such as human-in-the-loop (HITL) (2-point score) and explainable AI (XAI) (1-point score) indicate that the company is in the early stages of mitigating overdependence on AI models. This process area is capability level 1, which indicates that very little is being done to address this issue.

    • PA-6: Illegible logging and monitoring: The organization enforces a lower-class treatment, including automated generation of secured logging code (scored 3) and real-time monitoring and alerting hooks (scored 2). This PA achieved a Maturity Level 3 (Advanced), based on strong capabilities evidence in logging and monitoring.

  • Level 3: defined security code practices and integration

    • PA-1: Lack of secure code review. The organization has automated DAST (score of 3) and Continuous Integration with AI testing (score of 3). This PA scored 3 on the assessment, representing Advanced practice maturity.

    • PA-2: Insecure integration with other systems: The following practices were implemented: secure coding guidelines enforcement (score of 3) and static and dynamic application security testing (score of 3), which results in a score of 3 for this PA, which shows Advanced, illustrative of the capabilities of integration security.

  • Level 4: Quantitatively privacy and data protection management

    • PA-1: Privacy concerns and data leakage: The PA-1 process area includes practices, like differential privacy mechanism (score-3), access control, and role-based use (score-3). A couple of practices (model usage logging and auditing, score 0) suggest it is still falling short on implementing such privacy protections, though. The total mitigation score for this PA is 1 (Comprehension level), which indicates a requirement for considerable enhancement of privacy and data protection management.

The evaluation results through the case study of the Hybrid ANN-ISM Framework for mitigating Cyber Security Risks in Automatic code generation through Generative AI have shown that the company improved the situation of different types of cybersecurity risks in the company, but needs to enhance the situation of others.

The company demonstrates an Advanced (3) level of mitigation in most critical processes, including code quality and logic errors, backdoors and malicious code, and input validation. These spaces mirror strong processes that have proven successful, most notably static code analysis, passive penetration testing, and fuzz testing. And the widespread use of AI-based utilities and automated security audits indicates how sophisticated the company is in dealing with such threats.

From the evaluation of case study (encryption, reusability of insecure code, and logging and monitoring) are also areas where the company has demonstrated significant expertise, except that it can attain Advanced levels for most of the practices related to these process areas. This suggests that the company is successfully leveraging Generative AI to simplify the automation of security functions, including encryption key management or secure code recommendations, improving their global security posture.

However, the critique also serves as a reminder of where the company has work to do. Notably, in categories such as adversarial attacks on AI models and over-reliance on AI models, the company remains at the Development level (a score of 2). This implies that more work is still necessary to increase resilience in the face of adversarial threats and to minimize the risks of overdependence on AI systems. The Comprehension (score of 1) level in privacy and data leakage also suggests that the organization is at the beginning of regulating privacy in place, especially in model usage, logging, and auditing.

Overall, the company has done an excellent job addressing security risks introduced by the automatic code generator, especially regarding code quality, logic errors, and security review. However, there are certain aspects—like adversarial attacks, transparency in AI models, and data privacy—that we need to work on, and we need to advance. The company can also improve its security posture by concentrating on these areas and maintaining a stronger defense.

Figure 13 presents the overall case study evaluation of the hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI.

Fig. 13
figure 13

Case study evaluation of the hybrid ANN-ISM framework for mitigating cybersecurity risks in automatic code generation using generative AI.

Due to confidentiality constraints, the company’s identity remains undisclosed; however, it has been described as a medium-sized firm engaged in AI-driven software development and code generation, employing approximately 120 staff members. The organization actively utilizes Generative AI coding platforms such as CodeBERT and Codex, making it representative of the operational landscape in AI-assisted development environments. This added context helps readers better understand the scope and applicability of the evaluation results.

The explanation accompanying Fig. 13 has been expanded to define the interpretation of the ANN–ISM model’s output. The risk mitigation scores now range from 0 to 3, reflecting progressive levels of cybersecurity maturity as follows.

  • 0—Very Low Mitigation

  • 1—Basic Mitigation

  • 2—Advanced Mitigation

  • 3—Proactive Mitigation

A score of 3 (“Advanced Mitigation”) signifies that the organization has well-established cybersecurity procedures and policies, though further advancement is needed toward automation and predictive defense mechanisms.

Framework scenarios

Scenario 1: code injection via README files or comments

  • Threat: Malicious code could be introduced through README files or comments within the code repository.

  • Path in ISM: Code Injection (CR1) → README files or comments can act as entry points for attackers to insert harmful code or instructions that alter the behavior of the code generation process. This alteration can lead to the creation of insecure code, potentially introducing vulnerabilities like backdoors or data leaks.

  • Prioritized mitigations:

    • Policy filters: Introduce filters to validate any inputs or code generated from external sources, such as README files or comments, to ensure they are free from harmful content.

    • Context isolation: Separate the code generation process from comments or documentation, ensuring that only actual code inputs influence the output, preventing the potential manipulation from external, possibly compromised, metadata.

    • SAST tools (Static Application Security Testing): Integrate static code analysis tools into the CI/CD pipeline to automatically detect any malicious code injections during the generation phase, providing alerts to developers before deployment.

Scenario 2: backdoor insertion via third-party dependencies

  • Threat: Attackers exploit vulnerabilities in third-party libraries or frameworks to insert backdoors into the generated code.

  • Path in ISM:

    • Backdoors in Dependencies (CR3) → Malicious third-party libraries or dependencies are introduced into the code generation pipeline, either by design or through supply chain attacks.

    • These dependencies can contain backdoor code that remains dormant until activated by certain conditions, providing attackers with unauthorized access to the system.

  • Prioritize mitigations:

    • Dependency scanning: Integrate automated dependency scanning tools that assess third-party libraries for known vulnerabilities and suspicious code patterns. Ensure that only vetted dependencies are used in the CI/CD pipeline.

    • Trusted dependency management: Use secure and trusted dependency management systems, such as those that allow only approved versions of dependencies to be included in the project.

    • Code audits: Perform manual code audits and automated vulnerability scans during the code generation process to identify and mitigate any security flaws in the code, including hidden backdoors.

Scenario 3: adversarial manipulation of LLM inputs

  • Threat: Subtle manipulation of inputs provided to the LLM (Large Language Model) by developers or external users, leading to the generation of insecure code.

  • Path in ISM:

    • Adversarial attacks on AI models (CR10) → An attacker subtly alters the input fed to the Generative AI model, such as through malformed function descriptions or misleading comments, causing the model to generate insecure code.

    • This may lead to issues such as weak authentication mechanisms, flawed input validation, or improper error handling in the generated code.

  • Prioritized mitigation:

    • Input validation: Implement input validation mechanisms to detect and filter adversarial inputs before they reach the LLM. This includes checking for unusual patterns, characters, or syntax that could trigger insecure code generation.

    • Contextual analysis: Use contextual analysis algorithms to ensure that the generated code aligns with secure coding standards and does not exhibit risky behaviors, such as improperly handling sensitive data.

    • Model guardrails: Introduce guardrails in the AI model to prevent it from generating certain types of code, such as those that violate best security practices (e.g., unencrypted sensitive data handling).

These scenarios show how different threats in the code generation pipeline can be mapped through the ISM framework, allowing for the identification of risk propagation paths and enabling the application of prioritized mitigations to reduce the likelihood of security breaches.

Implications of the study

The findings of this study provide several important implications for both the academic and practical application of cybersecurity risk mitigation in the context of automatic code generation using Generative AI:

  • Automatic code generation uses the highest level of security posture: The result of the study indicates that the combination of Generative AI practices significantly contributes to the enhancement of automatic code generation in terms of cybersecurity. Further use of the automated best practices, including, among others, the code-review automation, static and dynamic vulnerability scanning, and adversarial training, provides comfortable defenses against many of the common threats, including injection, backdoors, malicious code, and input validation issues. The model describes a journey organizations should take from being in a Novice state of cybersecurity to being Advanced and emphasizes the importance of evolutionary maturity within security measures.

  • Maturity model development for cybersecurity: One of the main novelties of this study concerns the development of a structured maturity model, which adopts the ANN and the ISM approaches to evaluate and improve the maturity of cybersecurity in the automatic code generation domain. The paper highlights that through maturity-based assessment, an organization can assess its current state of cybersecurity practices, identify gaps, and prioritize enhancements. This provides a realistic approach for SaaS vendors to incrementally improve their security processes and practices, so that their security posture can keep pace with greater risk and technological changes.

  • Potential for automation in cybersecurity: The research demonstrates the possibility of automating critical cybersecurity tasks in the lifecycle of automatic code generation. This graduation result illustrates that methods such as vulnerability discovery via AI, secure code generation, and automated penetration testing can dramatically reduce human error and improve the response time and effectiveness of finding and resolving security threats. Therefore, the work focuses on the need for more research in developing automated security frameworks and tools. Studies that derive from Generative AI would allow organizations to embed security within the code generation process without much effort.

  • Application in real-world industry: Case studies and evaluations of the framework with a reputed AI-based automatic code generation company give practical relevance to the framework. The findings indicate that organizations already use AI technologies to secure automatic code generation. Yet, there are opportunities to improve some aspects, including data privacy, model transparency, and AI model monitoring. These results indicate that while the adoption of Generative AI is positive, we need to carefully deal with the problems of overwhelming dependency on AI models and adversarial attacks to improve even more security and robustness in production environments.

  • Dissecting flaws of reused code: One of the more notable results is the importance of vulnerabilities in reused code, ubiquitous in software development. The Hybrid ANN-ISM Framework offers the developer a method of identifying and fixing these weaknesses, thus ensuring that the legacy code and the third-party components do not make it possible to attack these modern systems. The paper highlights the role of strong dependency management and fully automated patch generation techniques throughout the development life cycle to minimize exposure to known vulnerabilities.

  • Need for cross-disciplinary collaboration: The research also highlights the need for collaboration across disciplines of AI researchers, cybersecurity experts, and software engineers. The incorporation of AI in cybersecurity frameworks, as illustrated by the Hybrid ANN-ISM Framework, demands knowledge from the domains of AI and cybersecurity. In this paper, we see the need for more interdisciplinary research that can help refine them and make them more adaptable to diverse software development contexts.

  • Privacy and data protection considerations: The study’s data leakage results indicate a deficit in the industry standard, especially concerning model usage logging, secure prompt engineering, and data sanitization. Although the framework focuses on robust mitigation mechanisms for cybersecurity threats, privacy-related issues still require more attention, especially with the growing scale and competency of AI models regarding sensitive data. This is why it is crucial to bake privacy preservation into the AI training and code generation process from inception, not as an afterthought.

  • In the proposed Generative AI Cybersecurity Risks Mitigation Model using an ANN–ISM Hybrid Approach entails both strategic investments and measurable returns for organizations seeking to enhance the security of AI-driven code generation systems. From a cost perspective, the model provide initial investments in computational infrastructure, AI training datasets, cybersecurity monitoring tools, and personnel upskilling to operate hybrid intelligence frameworks. These costs, however, are offset by substantial benefits, including a significant reduction in code vulnerabilities, automated threat detection efficiency, and improved model interpretability for compliance audits. By integrating artificial neural networks (ANN) for dynamic pattern recognition with Interpretive Structural Modeling (ISM) for hierarchical decision mapping, organizations achieve a balanced framework that enhances risk prediction accuracy, minimizes manual oversight costs, and supports continuous learning within secure development lifecycles. Overall, the long-term benefits—such as reduced breach risks, improved code reliability, and enhanced trust in generative AI applications—outweigh the initial implementation costs, making the hybrid model a cost-effective and sustainable cybersecurity solution for modern software enterprises.

  • Pathway for future research: The findings have thrown open many doors for future works in the realm of Generative AI and cybersecurity. Further studies might invest more in improving the framework in terms of dynamically adapting itself to the changes of security threats, as well as the extension to a broader set of software development environments. Moreover, the study of ethical questions related to the use of AI in security-critical settings is essential as the security industry steps towards more automation and AI-supported solutions.

The Hybrid ANN-ISM Framework for Cybersecurity Risk Mitigation presents a new generation of higher-secure automatic code generation processes by Generative AI. The contributions of the paper include theoretical and applied contributions that optimize the efficiency of the framework to identify, mitigate, and manage cybersecurity risks. Given the increasing deployment of AI in software development, the results of the study are identified as having an impacting role on future research and as a source of reference for industry for developing more secure, efficient, and privacy-preserving AI systems.

Limitations of the study

While the paper provides a thorough analysis of the Hybrid ANN-ISM Framework for Mitigating Cybersecurity Risks in Automatic Code Generation Using Generative AI, there are some limitations:

  • Small number of case studies: The assessment was based predominantly on one case study, one company, and one use of automatic code generation. The findings may not universally apply to other organizations with varied infrastructure, processes, or cyber-risks. More case studies in different industries would be helpful to confirm the applicability and workable of the framework across various contexts.

  • Focus on generative AI: While the paper shows how Generative AI could be used to reduce cybersecurity threats in code generation, it does not provide a deep investigation of other technologies or other approaches that could equally or more efficiently protect against these threats. The complementarity of Generative AI and other AI paradigms like reinforcement learning or expert systems has also not been thoroughly investigated.

  • Evaluation bias: The feedback collected from the internally company team may have some bias because the team members might have an interest in serving as a fair representative or may not be able to reveal all aspects. Third-party plus-ups or a review by outside cybersecurity experts might offer a more objective perspective of how well the framework is (or is not) fulfilling its function.

  • Absence of long-term assessment: The research tests the successfulness of the system in a pilot implementation of short-term duration. A longitudinal evaluation would be needed to see how the framework holds up over time and how, more specifically, it might evolve to reflect new cybersecurity threats and the continual learning patterns of the AI models.

  • Lack of cost–benefit analysis: The research fails to include any possible costs incurred in adopting the framework (i.e., the computational requirements for Generative AI tasks, the cost of integrating new tools, or the time staff would have to spend getting used to the tools). A more detailed cost/benefit analysis would shed additional light on the practicality of deploying the framework in real-world settings.

  • Technological and environmental assumptions: We assume there are advanced AI tools and much computation. Institutions with insufficient technical infrastructure or resources may struggle to implement such a framework successfully. The constraints are not represented in this study, and a new work would be necessary to verify how the architecture allows it to be adapted to this kind of environment.

  • Implicit threats: Although aspects such as injection attacks, code quality, and backdoors are covered under analysis areas, other IT threats like social engineering, insider threat, or supply chain attacks are not explicitly addressed in the model. The model could also be expanded in future research to include other categories of cybersecurity threats.

  • On over-dependence on AI model: A potential limitation of employing Generative AI to address abuse is the possible overreliance on AI-based mitigation approaches. As demonstrated in the analysis, the paper indicates cases of over-dependence on AI models. Although AI can be beneficial in this regard, the human factor and intervention are also necessary for such complex cyber challenges, and a balance must be found.

  • Narrow discussion of privacy regulations: The framework addresses concerns over privacy and data leakage, but it does not delve too deeply into the legal and regulatory considerations around the use of AI in code generation, particularly in regions with strict privacy laws (e.g., GDPR in Europe). The paper would benefit from a deeper analysis of how data privacy regulations interface with AI-based security models.

To conclude, while the proposed Hybrid ANN-ISM Framework offers a meaningful contribution to the study of cybersecurity in code generation, we must take the limitations discussed into account when interpreting the results. Several future research areas to improve the robustness, generalization, and practical usability in different real-world scenarios must be addressed.

Conclusion and future research direction

The Hybrid ANN-ISM Framework introduced in the paper can protect automatic code generation from cybersecurity threats using Generative AI. By taking advantage of the merits of ANN and ISM, this framework is robust to tackle the serious cybersecurity problems of the automatic code generation systematically, especially on injection attacks, backdoor, and insufficient input validation etc. A detailed case study and measurement showed the framework could lead to significantly improved identification and mitigation of security risks even in organizations with various security maturity levels. The company implementing the framework made significant strides in key areas, after which more advanced mitigations were already in place, including code quality, malware detection, and secure input validation. However, some process areas (i.e., adversarial attacks against AI models, over-dependence on AI, and lack of AI models) revealed a need for additional development, and such processes could be improved in systems with human in the loop, explainable AI (XAI), and robustness of AI models to adversarial threats.

Furthermore, the framework could stress the significance of data privacy and model security, specifically in reducing the exposure to data leakage and insecure code integration. While the organization has progressed towards standard security practices for code generation, more developments are required to fill privacy gaps and improve logging, monitoring, and adversarial defense.

Finally, in conclusion, the Hybrid ANN-ISM approach presents a valuable technique for systematically addressing issues introduced by cybersecurity risks in automatic code generation. Its versatility and flexibility make it a must-have for any organization that values security and dependability in its automated systems. In the future, concentrated attention in particular areas such as adversarial robustness, AI model transparency, and privacy hygiene will be necessary for safeguarding against the broader set of emerging threats that enable automated code generation.

The proposed Hybrid ANN-ISM Framework for Mitigating Cybersecurity Risks in Generative AI in Auto-Generating Codes can potentially improve cybersecurity in software development. However, for better performance and more general use, there are still some directions to be explored:

  • Integration of advanced AI techniques: While the present framework includes Generative AI, there is potential for integrating more advanced AI techniques such as Reinforcement Learning, Federated Learning, and Deep Learning-based Anomaly Detection. If these methods can be integrated, the defense framework should adapt and react to dynamic changes in cybersecurity threat activities in real time. It needs to be a more robust and dynamic defense mechanism.

  • Real-time threat discovery and response: Further work can be done to explore how the hybrid approach can be extended towards real-time threat discovery and response in the code generation pipeline. This would mean monitoring created code in real time as it is generated, searching for security weaknesses. Including run-time behavior analyzing tools and live anomaly detection in this framework could enhance its capabilities to protect against zero-day exploits and advanced attack methods.

  • Cross-domain applications: Although the current model is oriented towards security risks in automatic code generators, future research could investigate the possibility of expanding the use of this model for cloud computing, IoT, and mobile applications. Each of these domains imposes its unique security requirements, and shaping the framework to fit the specific requirements of each of these would potentially provide broader applicability as well as more evidence of its effectiveness.

  • Automated security audits/compliance: An opportunity area could be automatic compliance checks with industry standards and regulations. Adopting the framework and combining it with scanners to automate code scanning for compliance with security regulations (such as GDPR, HIPAA, PCI-DSS) would be helpful for organizations to keep their security and privacy levels up. We will consider in future work to improve the ability of the framework to produce compliance reports and to perform security audits with less human effort.

  • Human-AI partnership: There’s also a bit of a balancing act when it comes to AI in cybersecurity, which is how to let humans be in control, without interfering with the machine or slowing systems down. In the future, one could investigate human-in-the-loop (HITL) systems that can incorporate the AI-driven models with human knowledge or experiences, for example, in the cases of complex or new security environments. Additional research could focus on enabling human decision-making at crucial points in code generation, keeping the framework decision consistent with organizational policies and acceptable threat levels.

  • Better privacy: At present, the existing structure does not offer an adequate solution to the risks facing data privacy. Nevertheless, privacy regulations are gaining attention, and the fear of data breaches is becoming more critical; as a future work, it may make sense to add a part regarding privacy-preserving to the framework. “Security features, such as differential privacy, secure multi-party computation, and homomorphic encryption, can be integrated into the model to improve the security of sensitive data used in code generation.

  • Performance tuning: Given that AI-powered cybersecurity solutions can result in heavy computational overhead, further investigation may consider tuning the performance of the Hybrid ANN-ISM Framework to become more efficient in practical applications. This could mean alleviating the computational complexity of Generative AI models, enhancing the scalability of the framework, and making sure that it is capable of handling large-scale code generation tasks without compromising on accuracy and speed.

  • Real-world case studies: Although the case study in the paper is an essential initial step in evaluating the framework for practical utility, the following search should involve real-world case studies across different domains and code generator environments. Comparisons with other cybersecurity frameworks can also contribute to an enriched understanding of the advantages and limitations of the Hybrid ANN-ISM Framework.

  • Explainability and transparency in AI: With AI models increasingly central to the cybersecurity toolset, the demand for clarity and transparency around AI-made decisions becomes paramount. In the future, this work aims to represent the interpretability aspect of the Hybrid ANN-ISM Framework by creating techniques to deliver comprehensible insights to the user as to how the framework is detecting and preventing the threats. Making decisions generated by AI explainable to non-experts will lead to greater confidence and enable it to become more widely adopted.

  • Adaptive framework for dynamic and evolving cyber threats: The security framework must adapt as the cyber threats evolve. Given the security and technology focus, research to improve the self-updatability of threat detection techniques could be explored, leveraging real-time data feeds, threat intelligence platforms, and global trends in cybersecurity. This would make the framework capable of evolving in proactive reactions to newly invented attack mechanics, e.g., breaking approaches for new technologies or new forms of exploitation.

Overall, the future development of the proposed Hybrid ANN-ISM Framework provides countless opportunities for contributing to the advancement of the cybersecurity community for automatic code generation. By considering these research directions, the framework can be further strengthened to assist in mitigating risks, coping with new threats, and securing the privacy of software systems produced by AI-based methods.