Validating AI-Generated Code with SonarQube and Fallow

22 May 2026

AI coding tools are genuinely useful. They also write code that’s confidently wrong in ways that are easy to miss on a quick read. Running SonarQube on everything has become a normal part of how I work, not an optional extra. It’s open source and self-hostable, and fits naturally into a CI pipeline. If you don’t want to run your own instance, SonarQube Cloud has a free tier for public repositories and up to 50,000 lines of code, and it connects to GitHub in a few clicks.

The problem with AI code

AI generates code that looks reasonable. The variable names make sense, the structure is familiar, it does approximately what you asked for. What it doesn’t do reliably is care about your specific codebase: what’s actually called at runtime, what’s dead weight, where security issues are hiding in the plumbing.

The output also tends to be generous. It’ll add error handling for cases that can’t happen, introduce abstractions for a single use site, and occasionally import something you weren’t using. A few dozen AI-assisted commits in and the signal-to-noise ratio drops noticeably.

Duplication

AI writes duplicated code constantly. Not copy-paste obvious duplication, but subtler than that. It’ll implement the same validation logic in three different files because each time you asked it to add a feature, it solved the problem locally without knowing the solution already existed elsewhere. It has no memory of what it wrote last week in a different context, and it doesn’t go looking.

SonarQube’s duplication detection catches structural duplication: blocks of code that are textually similar, possibly with minor variations. A duplicated block is two places to fix the next bug, two places where behaviour can quietly diverge.

SonarQube’s duplication detection works on structure and tokens, so it won’t reliably flag two functions that do the same thing but use different variable names, different control flow, or a slightly different implementation shape. If you’re on a TypeScript project, Fallow’s semantic analysis mode can catch this, covered further down.

What SonarQube catches

Linters check files. TypeScript checks types. SonarQube checks the codebase. It doesn’t use AI to invent findings; it produces deterministic evidence that humans and agents can inspect.

In practice it covers the static analysis layer: security vulnerabilities, code smells, coverage, and bugs that pattern-match against its ruleset. For AI-generated code specifically, it catches things like:

Hardcoded credentials that slipped in during a scaffolding step
SQL or command injection patterns the AI assembled without thinking about context
Dead code that got introduced but never cleaned up
Duplicated string literals: AI will repeat the same string across a file rather than extracting a constant. SonarQube flags these because any change needs to be propagated to every occurrence, which is exactly the kind of thing that gets missed.
High cognitive complexity: AI tends to solve problems by nesting conditions and early returns rather than simplifying control flow. SonarQube measures how hard a function is to follow and flags it when the score gets too high. Code with high cognitive complexity is hard to read, test, and modify, which matters more when you didn’t write it yourself.

The quality gate is the useful part: a hard pass/fail on each PR means issues can’t quietly accumulate. SonarQube is wired into the Codeberg CI/CD pipeline, so every push gets scanned automatically. The analysis runs, and a failed gate blocks the build. No manual step, no temptation to skip it when you’re in a hurry.

My employer, Orange Business, provides vCD environments where the SonarQube installation and pipeline infrastructure both live, which makes this practical to run without carving out resources elsewhere.

Fallow for TypeScript codebases

For TypeScript and JavaScript projects, Fallow is worth considering alongside SonarQube. It’s also open source and self-hostable, and it approaches the codebase differently: codebase intelligence rather than rule-based static analysis, with optional runtime data about hot paths and cold paths.

SonarQube asks “is this code safe and clean?” and Fallow asks “does this code reflect how the system actually behaves?” Fallow surfaces which code paths are actually exercised at runtime, making it easier to spot redundant logic that looks distinct on the surface but serves the same purpose. It also helps with prioritisation: refactoring a cold path that nothing calls is a different risk than touching something hot.

Duplication detection

Running fallow dupes out of the box gives you clone groups: blocks of structurally identical code across the codebase, found via suffix-array analysis:

$ fallow dupes

● Duplicates (3 clone groups)

     57 lines  2 instances
    src/components/Calendar/CalendarMonth.stories.tsx:597-653
    src/components/Calendar/CalendarYear.stories.tsx:818-874

     42 lines  3 instances
    src/features/forecasting/server/procedures/analytics.ts:141-181
    src/features/forecasting/server/procedures/cashflow.ts:153-194
    src/features/forecasting/server/procedures/income.ts:590-631

  Identical code blocks detected via suffix-array analysis

✓ 27,255 lines (19.4%) duplicated across 398 files (0.23s)

That’s the default mode. For the semantic duplication that SonarQube misses, code that was copied and then adapted with renamed variables and different literal values, there’s --mode semantic:

fallow dupes --mode semantic

This uses token-type normalization to match structurally equivalent code regardless of what the variables are called. It produces more matches than the default, some of which are intentional, so it’s best used when you specifically want to catch adapted clones rather than as a starting point.

Codebase intelligence

Beyond duplication, Fallow surfaces which code is actually exercised at runtime. Before asking AI to refactor something, knowing whether a function is called constantly or barely at all is useful context.

Both SonarQube and Fallow expose MCP servers, so either tool, or both, can feed findings back into the same AI session that produced the code in the first place. Instead of waiting for CI to report back, the AI can query current issues, duplication findings, and codebase intelligence as part of writing the code, and check its own output before anything gets pushed.

← Back to all posts