AI code security: vulnerabilities to check before delivery

Q: Is AI-generated code always less secure?

No. It can be correct, especially on simple and well-specified tasks. The risk comes from the fact that it can also produce vulnerable code with great confidence, hence the need for a systematic review.

Q: Should ChatGPT or Copilot be banned for developers?

In most SMEs, a total ban is hardly realistic. It is better to define the authorized uses, exclude secrets and sensitive data from prompts, then impose checks before each integration.

AI code security mainly changes your level of control: a tool like Copilot, ChatGPT, or Claude can speed up development, but it can also suggest vulnerable code, questionable dependencies, or overly permissive logic. In 2025/2026, Veracode reports that 45 % of the tested generated-code tasks introduced known flaws. For an SMB project, the time saved only has value if the security review is planned from the quote stage.

AI code security: vulnerabilities to check before delivery

AI code security: the real risk is not AI, it’s validation

The intent behind a search about AI code security is rarely academic. A business leader wants to know whether the use of AI tools by their provider puts their website, mobile app, or customer data at risk. The short answer: not necessarily, but only if generated code is treated as a proposal, never as a delivery.

Language models (LLMs, AIs trained on large volumes of text and code) often write plausible code. That is their force. It is also their weakness. They can produce a function that works lors a quick test, while forgetting to escape user input, handle permissions, or use a maintained dependency.

Veracode tested more than 100 LLMs in 2025 on Java, Python, C#, and JavaScript. Its benchmark covers 80 coding tasks and several CWE categories, meaning documented families of software weaknesses, such as SQL injection CWE-89 or Cross-Site Scripting CWE-80. The overall result, confirmed in a Spring 2026 update, remains stable: about 55 % of code considered secure, 45 % vulnerable.

This figure does not mean that nearly one project out of two will be hacked. It says something else, more useful: without clear security instructions and without verification, AI too often introduces errors that are already known. Ordinary errors. Therefore avoidable.

What the recent numbers say to decision-makers

Sonar surveyed more than 1,100 professional developers in 2026. According to this survey, AI-generated or AI-assisted code already represents 42 % of committed code, with a projection of 65 % in 2027 according to respondents. In other words, even if you do not explicitly ask for AI, it is likely to appear in the production chain.

The problem is not its use. It is the gap between distrust and discipline. Sonar remords us that 96 % of developers do not fully trust AI-generated code, but only 48 % always verify this code before sending it to the repository. This contradiction is very concrete for a budget: one hour saved coding can become three hours of correction if the flaw is detected late.

Veracode also provides differences by language. Java shows 72 % of failures in security tests in its sample, compared with 38 % for Python, 43 % for JavaScript, and 45 % for C#. These results do not rank languages from best to worst in absolute terms. Rather, they remind us that a technical stack, a framework, and review habits matter just as much as the AI tool.

Source and year	Indicator	What this changes for a project
Veracode 2025/2026	45 % of tested AI code tasks were vulnerable	Plan for a systematic security review, not just functional testing
Veracode 2025	Undefended XSS/CWE-80 in 86 % of the relevant samples	Check the display of user data in web interfaces
Sonar 2026	42 % of committed code is said to be generated or assisted by AI	Ask how the agency traces and reviews AI-assisted code
Sonar 2026	48 % of developers always verify before commit	Formalize validation in the workflow, not in intentions
GitHub 2026	CodeQL/Copilot autofix covers more than 90 % of alert types in JavaScript, TypeScript, Java, and Python	Automate part of the detection, without replacing human review

The vulnerabilities to look for as a priorrity in generated code

The most dangerous errors are not always spectacular. A SQL injection allows an attacker to hijack a database query. An XSS, or Cross-Site Scripting, injects script into a page viewed by a user. A leak of sensitive information exposes an API key, a token, or personal data.

OWASP, the community reference in application security, lists in its Top 10 for Large Language Model Applications risks related to the use of LLMs: poor sorties management, supply chain vulnerabilities, excessive trust, disclosure of sensitive information, and excessive agent autonomy. These risks also apply to traditional web projects as soon as an AI assistant takes part in development.

A common pitfall involves dependencies. A dependency is an external software building block added to the project, for example an npm package in JavaScript or a Python library. Cloudsmith warned in 2026 about AI-hallucinated packages and “slopsquatting,” a practice where a plausible package name or one close to a real package can lead to an unreliable component. For a non-technical person, this is invisible in a demo.

In the projects we carry out, we often see the same trade-off: AI is very helpful for producing repetitive code, unit tests, or a first version of an interface, but it is less reliable as soon as it has to handle permissions, payments, personal data, or server security. At that level, it is better to move more slowly than to corrige a vulnerability in production.

The verification checklist before going live

Good governance does not mean banning AI. It means knowing where it is used, on which parts of the code, and with what safeguards. For an executive, the question to ask the provider is simple: “what proof of control do you provide before delivery?”

Identify the parts generated or heavily assisted by AI, especially in authentication, forms, payments, and administration.
Run a SAST analysis (static code analysis, without running the application) with tools like SonarQube, GitHub CodeQL, or Semgrep.
Check dependencies with npm audit, pip-audit, Dependabot, Snyk, or an artifact registry like Cloudsmith.
Check secrets: no API key, no password, no token in GitHub, GitLab, or Bitbucket.
Test user inputs: forms, search, comments, file upload, URL parameters.
Review business permissions: one customer must not be able to see another customer’s data, even if the interface does not show it.
Document accepted, corriged, or reported alerts, with a reason understandable to a decision-maker.

GitHub indicates in 2026 that CodeQL and Copilot code-scanning autofix cover more than 90 % of alert types in JavaScript, TypeScript, Java, and Python, with a possible correction for more than two-thirds of the supported vulnerabilities with little or no editing. That is valuable. But an automatic correction does not always know whether a business rule is correct.

If your project uses AI agents connected to your tools, the issue becomes broader than code. The Model Context Protocol, or MCP, standardizes, for example, the connection of agents to data and services; its value is real, but permissions must be framed, as explained in our analysis of the MCP for connecting AI agents to data. The more an agent can act, the more explicit its limits must be.

Budget, timeline: how much does a serious security review cost?

A reasonable security check has a cost, but it often remains lower than the cost of an incident. In France, depending on providers and the size of the scope, expect around €800 to €2,500 excluding tax for a light review of a small site or application module, and rather €3,000 to €8,000 excluding tax for a more complete application audit with code analysis, dependencies, and an actionable rapport. A penetration test scoped to a business application frequently exceeds €5,000 to €15,000 excluding tax.

The timeline depends on when the review takes place. Before going live, an automated pass plus a targeted review can take two to five business days on a well-organized SME project. After delivery, with a poorly documented codebase, the same checks can take twice as long. Honestly, skipping this step on an application that handles customer data is not a good trade-off.

The best cost/benefit rapport comes from integrating checks into the CI/CD pipeline, that is, the automated process that tests and deploys the software. A scan on every change costs little once configured. A vulnerability discovered at acceptance testing, on the other hand, disrupts the schedule, the budget, and sometimes even the commercial launch.

Regulatory obligations renforce this logic. The GDPR, applicable since 2018, requires personal data to be protected by appropriate measures. The European Cyber Resilience Act also creates new requirements for certain software and components; to anticipate this framework, you can read our overview of the obligations of the Cyber Resilience Act for software and plugins. If your activity falls within the scope of NIS2, the security of the site or application can no longer be treated as a late add-on; our guide on NIS2 applied to WordPress since October 2024 details this change.

When AI truly speeds things up, and when it becomes a bad choice

AI is relevant for generating component skeletons, suggesting tests, translating a piece of logic from one language to another, or documenting existing code. It can also help identify simple inconsistencies. On a project with a tight budget, it is useful if the time saved funds better acceptance testing and better security.

The obvious solution becomes a bad one when the team asks AI to quickly produce a sensitive feature without a framework. Authentication, Stripe payment, export of personal data, synchronization with a CRM, administrator back office: these areas deserve human design, then possibly AI assistance under supervision. Code that “seems to work” is not proof of safety.

The technical choice also matters. A modern JavaScript project can rely on Node.js, Deno, or Bun, with different dependency and maturity models; this type of trade-off is abordéd in our comparison of JavaScript runtimes in 2026. On the hosted AI side, sovereignty and confidentiality also matter: for French SMEs, the use of tools such as Mistral must be assessed from a GDPR perspective, as in our guide on Le Chat by Mistral and data protection.

On the agency side, the instinct is to separate what can be accelerated from what must be secured from the design stage onward. This boundary avoids many unnecessary debates: AI is neither prohibited nor accepted everywhere. It is governed.

What should you require from a service provider that uses AI for coding?

A serious service provider does not need to hide AI. They must explain how they use it, how they review it, which automated tools are in place, and what limits have been set. Transparency is better than a vague promise of productivity.

Ask for a version-controlled code repository, a historory of validation, a static analysis report, a status of dependencies, and a secrets management policy. If the application processes personal data, also ask where the prompts and code snippets are sent. The arXiv study published in April 2026 on security discussions around GitHub Copilot identifies four recurring concerns: data leakage, code licenses, adversarial attacks or prompt injection, and insecure code suggestions.

AI code security is therefore less a matter of tools than a matter of method. Defining this type of project upstream avoids most unpleasant surprises; this is often where an outside perspective saves time by transforrming a vague risk into verifiable checkpoints.

FAQ on the security of AI-generated code

Is AI-generated code always less secure?

No. It may be correct, especially on simple, well-specified tasks. The risk comes from the fact that it can also produce vulnerable code with great confidence, hence the need for a systematic review.

What tools should you use to check AI code?

The most common are SonarQube, GitHub CodeQL, Semgrep, Dependabot, Snyk, npm audit, or pip-audit. They detect known vulnerabilities, risky dependencies, and quality issues, but do not replace a business review.

Should ChatGPT or Copilot be banned for developers?

In most SMEs, a total ban is not very realistic. It is better to define the autorrized uses, exclude secrets and sensitive data from prompts, and then impose checks before each integration.

How much time should be added to the schedule to secure AI code?

For a small scope, often allow two to five business days for targeted checks. For a business application with authentication, roles, and personal data, the review must be integrated throughout development.