Table 1 Cybersecurity risks in automatic code generation.
Code no | Cybersecurity risks (CRs) | Description | Examples |
|---|---|---|---|
CR1 | Injection attacks19 | Automatic code generation can inadvertently introduce injection vulnerabilities, such as SQL or command injection, where attackers can inject malicious input to manipulate the code’s behavior | Malicious inputs that manipulate database queries or system commands to leak sensitive data or gain unauthorized access |
CR2 | Code quality and logic errors64 | Generated code may contain logical flaws or inefficient code, leading to vulnerabilities. These flaws can arise from the limited contextual understanding of the AI and might result in improper error handling, unchecked data flows, or bad design practices | A bad input validation or throwing exceptions incorrectly might expose the system to be exploited |
CR3 | Adversaries could inject backdoors or other types of malicious code into the generative AI model during the model training process, or the model can mistakenly output insecure code | A malicious actor could teach an AI system on malicious software, subsequently having the tool create software with embedded vulnerabilities or unauthorized entry points | |
CR4 | Vulnerabilities in reused code (legacy dependencies)65 | Most auto-generated code is dependent on existing libraries/frameworks that have a risk profile that is not being managed | Generating code that uses an old version of a known buffer overflow library, which an attacker could abuse |
CR5 | Insufficient Input Validation30 | Generated code may not include sufficient input validation and could be compromised by users sending carefully crafted input that is designed to cause buffer overflows or execute malicious code | AI-led code for user authentication fails to properly validate input fields, making the code vulnerable to SQL injection or buffer overflow |
CR6 | Weak authentication and authorization mechanisms66 | The code generated may lack a secure, strong authentication/authorization mechanism, exposing the systems to unauthorized access, privilege elevation, etc | A web application created by AI might provide the user a means to circumvent the login, or the access control might be weak or ineffectively implemented, allowing privilege escalation |
CR7 | Lack of encryption or insecure data handling67 | Generated code may also overlook encrypting data or storing it in unsecured ways, which will invite theft or unauthorized access to data | A password management application that stores passwords and/or financial information in clear text. If the application is compromised, sensitive data can be easily captured |
CR8 | Reusability of vulnerable code10 | Code sharing in AI models could lead to similar vulnerable code in different projects. The kind of security risks, if not addressed, can spread throughout different apps and enterprises | A commonly used AI software may produce code from an API with security flaws that no one checked for, and that code might have gone unnoticed across multiple systems |
CR9 | Lack of secure code review and testing68 | Some AI-generated code might not be scrutinized in the same detailed ways, so that security vulnerabilities could be missed | The source code is not necessarily subjected to a static analysis for security weaknesses and exploitable vulnerabilities |
CR10 | Adversarial attacks on AI models21 | The AI model running for code generation could be attacked by attackers who abuse the training data to let the AI output malicious or incorrect code | An adversary input can fool the AI into providing code with vulnerabilities or adding back doors to the system |
CR11 | Over-reliance on AI model21 | Developers can rely too much on AI scripts and skip on the due diligence, which results in vulnerable systems or code that is not fully understood | Unthinkingly relying on AI-generated code without scrutinizing it can result in implementing insecure algorithms or suboptimal security methods |
CR12 | Privacy issues and data leakage69 | Two data-level issues appear across different application domains: AI models can produce code that unintentionally interfaces private data with code (resulting in data leakage, or violation of privacy regulations such as GDPR or HIPAA) | An AI-created web form might mismanage personal data and expose information about users to third parties |
CR13 | Insecure integration with other systems70 | Alternatively, auto-code generation may create vulnerabilities if not safely integrated with other systems or services | A code snippet is generated for third-party payment API integration, where an API key is exposed, or your payment gateway may not handle authentication well |
CR14 | Insufficient logging and monitoring71 | Code produced by AI may lack proper logging and monitoring capabilities designed to trace anomalous or incident activities | Without adequate logging, serious security incidents, including attempts to gain unauthorized access, could remain unnoticed and unchallenged |