← All docs

docs / case-studies.md

Case studies

Marketing copy is easy to produce. Measured behavior on real codebases is harder. This section is the harder kind.

Each case study below is a real run of cix against real code, with methodology, raw findings, and honest discussion of where cix excelled and where it fell short.

Pre-release cleanup: cix vs. a less-capable indexer

A Laravel 12 + Vue 3 SPA, mid-rename, with a half-propagated refactor in flight. The same cleanup prompt was run twice — once with a symbol-only indexer and once with cix — by the same evaluator, cold start each time.

Highlights:

  • Eight findings in common.
  • Three additional findings only cix surfaced — including a forgotten Python script in a public-served directory hardcoding root MySQL credentials.
  • Roughly half the tool calls (29 vs. 55).
  • Roughly a third of the tokens (30–40k vs. 80–100k).

Read the full case study →

The 25-repo benchmark

A standardized evaluation rubric run against 25 real-world open source projects spanning Python, TypeScript, Go, Java, C#, PHP, and Ruby. Each repo was scored on a 15-point rubric covering project understanding, change-impact analysis, and breakage analysis.

Highlights:

  • 15/15 on a small Flask CRUD app — clean sweep across all rubric dimensions.
  • 14/15 on standard backend stacks — Flask variants, FastAPI, Django, monolithic Rails-style apps.
  • 13–14/15 on multi-thousand-file backends — Spring Boot, ASP.NET Core, large Laravel apps, Apache Airflow.
  • 10–13/15 on large unconventional codebases — Loki, NetBox, Strapi, Saleor.
  • 1–4/15 on Lua-heavy codebases — Kong scored low because cix has no Lua parser yet. The system was honest about why.

The benchmark also surfaced specific gaps worth tracking — dynamic-route registration patterns where the route view is partial, same-name symbol disambiguation in some impact queries, and the Lua parser gap.

Read the full benchmark →

Apache Airflow deep-dive

A focused walkthrough of cix on Apache Airflow — a large Python monorepo with a hundred-plus provider packages, multiple FastAPI applications, and seventy-four database tables.

Highlights:

  • Detected the FastAPI/Alembic/SQLAlchemy stack correctly.
  • Surfaced 209 HTTP routes and the schema across 74 tables in a single orientation query.
  • Identified entry points, request surfaces, and architecture boundaries without falling back to grep.
  • Honestly flagged where the schema parser was incomplete (some Alembic migrations that mutate rather than create tables).

This case is included to show how the system behaves on a project at the upper end of size and complexity — and where the partial-coverage signals show up.

Read the deep-dive →

What these have in common

A consistent pattern emerges across the cases:

  • On standard stacks, cix dramatically reduces the work an assistant has to do — fewer file reads, fewer tokens, more structured analysis, more catches.
  • On non-standard stacks, the picture is mixed. The system is honest about what it can and can't see; the user is on solid ground when deciding how much to trust each query.
  • On unsupported languages, the system fails openly. There is no scenario where cix returns confident-looking nonsense when the underlying parser doesn't exist.

That last property is what makes the others trustworthy. A tool that gracefully says "I don't know" is a tool you can build a workflow around.

Where to go next