DOCS · CASE STUDIES

Measured behavior on real codebases.

Marketing copy is easy to produce. Side-by-side runs on real code are harder. Each case study below is a real run, with methodology, raw findings, and honest discussion of where cix excelled and where it fell short.

Read the full source →

Case study · 2026-04

Pre-release cleanup vs. a less-capable indexer

Read the full case study →

Laravel 12 + Vue 3 SPA, mid-rename. Same prompt run twice, cold start each time. Eight findings in common. Three additional findings only cix surfaced — including a forgotten Python script in /public hardcoding root MySQL credentials.

Tool calls (cix vs. baseline): 29 / 55
Tokens (cix vs. baseline): 30–40k / 80–100k
Findings unique to cix: 3

Case study · 2026-04

The 25-repo benchmark

Read the full case study →

A standardized rubric run against 25 real-world open source projects across Python, TypeScript, Go, Java, C#, PHP, and Ruby. 15-point scoring for project understanding, change-impact analysis, and breakage analysis.

Small Flask CRUD app: 15 / 15
Standard backend stacks: 13–14 / 15
Lua-heavy projects (no parser yet): 1–4 / 15

Case study · 2026-04

Apache Airflow deep-dive

Read the full case study →

A focused walkthrough on a large Python monorepo: 100+ provider packages, multiple FastAPI applications, 74 database tables. The schema parser was honestly partial on Alembic migrations that mutate rather than create — and said so.

HTTP routes detected: 209
Database tables: 74
Stack detection: FastAPI / Alembic / SQLAlchemy

What these have in common

On standard stacks, cix dramatically reduces the work an assistant has to do. On non-standard stacks, the picture is mixed — and the system is honest about what it can and can't see. On unsupported languages, the system fails openly rather than fabricating. That last property is what makes the others trustworthy.