Benchmarks

A condensed view of cix's measured performance across real-world projects. The full per-project breakdown is in the 25-repo benchmark case study; this page is the at-a-glance summary.

The score distribution

Twenty-five projects, scored on a 15-point rubric covering project understanding, architecture, schema/routes/handlers/external interfaces, change-impact, and breakage analysis.

Score band	Projects	Pattern
15/15	1	Clean sweep on a small Flask CRUD app
14/15	8	Standard backends in supported languages
13/15	4	Larger, more complex backends including Apache Airflow
11–12/15	2	Monorepos with mixed structure
9–10/15	7	Large unconventional codebases or projects with dynamic registration patterns
1–4/15	4	Lua-heavy codebases (no Lua parser today)

Reading the distribution: fourteen of twenty-five projects scored 13 or above. Twenty-one of twenty-five scored 9 or above. The lowest scores are concentrated in one project family — Kong — where the language coverage gap is the binding constraint.

By language

Language	Projects in the set	Typical score range	Notes
Python	Flask, FastAPI, Django, Airflow, NetBox, demo apps	13–15	Highest-confidence stack
TypeScript / JavaScript	Vue, Express, Cal.com, Strapi, Saleor, Directus	10–14	Strong on standard apps; partial on dynamic routing
Java	Spring Boot	14	Schema and route coverage clean
C#	ASP.NET Core	13	Strong on conventional .NET projects
PHP	Laravel, Monica	14	Migration-driven schema view works well
Go	Loki, rclone, Turbo	9–11	Static stack works; dynamic handler registration is partial
Lua	Kong (multiple runs)	1–4	No Lua parser today; the system says so

By project shape

Small CRUD apps. Score range 14–15. Schema, routes, and impact analysis all clean.

Standard MVC backends. Score range 13–14. Whether Laravel, Django, Rails-style, or Spring Boot, the system handles them well.

Large monorepos with conventional structure. Score range 13. Airflow is the test case. Multi-component systems with explicit boundaries remain tractable.

Large monorepos with dynamic structure. Score range 9–11. Loki and rclone are the test cases. The static parts work; the dynamic parts return partial results that are honestly labeled.

Plugin hosts and gateways. Score range 1–4 in Lua, ~12 if there's a Python-or-Java-shaped equivalent. The current weak point.

What "high score" actually means

A 14/15 means cix produced grounded, accurate answers on every dimension of the rubric, with at most a single small gap (typically a missing route, an incomplete schema column, or a low-confidence impact resolution). It means an AI assistant working on this project with cix in the loop has the information it needs to behave well.

A 10/15 means cix produced grounded answers on most dimensions, with some areas requiring fallback to file reading. The assistant still gets value, but more of the work happens through other tools.

A 4/15 means the index is largely empty for this project — cix is operating below its useful threshold, and the assistant should treat its output as suggestive rather than authoritative. This is the honest signal that says: "use a different approach for this project."

The fact that the system gives you these signals — rather than producing 14/15-shaped output everywhere regardless of actual coverage — is the reason the high scores are trustworthy.

What we are working to improve

The benchmark surfaced specific, addressable gaps:

Lua parser support. Would lift Kong-class projects from 1–4 to roughly 11–13, based on the same rubric.
Dynamic-handler extraction. Would lift Loki and rclone from 9–11 toward 13.
Same-name symbol disambiguation in impact analysis. Would lift several projects by a point.
Test-fixture route filtering. Would clean up the route view on monorepos with extensive test infrastructure.

None of these are conceptual blockers. They are work, scoped and prioritized in the roadmap.

What we are not promising

A 14/15 score on every project. The benchmark shows the score depends on the project's shape and language. Run cix on yours before assuming.
Coverage of every language. The set of supported parsers is finite today; we are honest about which.
Fully automated indexing of dynamic patterns. Some surface remains static-analysis-invisible by nature.

How to use this page

If your project is a standard backend in Python, Java, C#, PHP, JavaScript, or TypeScript, the benchmark suggests cix will deliver high value with low setup cost. If your project is in Lua, primarily uses dynamic dispatch, or sits at an unusual boundary (custom routing layer, exotic ORM), use the benchmark scores as a calibration: cix will help, but partially, and you should run a small evaluation before committing.

Always-applicable guidance: read the Limitations page before adopting. We say what doesn't work as clearly as we say what does.

25-repo benchmark — the full breakdown.
Pre-release cleanup case study — measured comparison on a single project.
Limitations — explicitly where cix underperforms.
Roadmap — what we're improving.