← All docs

docs / benchmarks.md

Benchmarks

A condensed view of cix's measured performance across real-world projects. The full per-project breakdown is in the 25-repo benchmark case study; this page is the at-a-glance summary.

The score distribution

Twenty-five projects, scored on a 15-point rubric covering project understanding, architecture, schema/routes/handlers/external interfaces, change-impact, and breakage analysis.

Score bandProjectsPattern
15/151Clean sweep on a small Flask CRUD app
14/158Standard backends in supported languages
13/154Larger, more complex backends including Apache Airflow
11–12/152Monorepos with mixed structure
9–10/157Large unconventional codebases or projects with dynamic registration patterns
1–4/154Lua-heavy codebases (no Lua parser today)

Reading the distribution: fourteen of twenty-five projects scored 13 or above. Twenty-one of twenty-five scored 9 or above. The lowest scores are concentrated in one project family — Kong — where the language coverage gap is the binding constraint.

By language

LanguageProjects in the setTypical score rangeNotes
PythonFlask, FastAPI, Django, Airflow, NetBox, demo apps13–15Highest-confidence stack
TypeScript / JavaScriptVue, Express, Cal.com, Strapi, Saleor, Directus10–14Strong on standard apps; partial on dynamic routing
JavaSpring Boot14Schema and route coverage clean
C#ASP.NET Core13Strong on conventional .NET projects
PHPLaravel, Monica14Migration-driven schema view works well
GoLoki, rclone, Turbo9–11Static stack works; dynamic handler registration is partial
LuaKong (multiple runs)1–4No Lua parser today; the system says so

By project shape

Small CRUD apps. Score range 14–15. Schema, routes, and impact analysis all clean.

Standard MVC backends. Score range 13–14. Whether Laravel, Django, Rails-style, or Spring Boot, the system handles them well.

Large monorepos with conventional structure. Score range 13. Airflow is the test case. Multi-component systems with explicit boundaries remain tractable.

Large monorepos with dynamic structure. Score range 9–11. Loki and rclone are the test cases. The static parts work; the dynamic parts return partial results that are honestly labeled.

Plugin hosts and gateways. Score range 1–4 in Lua, ~12 if there's a Python-or-Java-shaped equivalent. The current weak point.

What "high score" actually means

A 14/15 means cix produced grounded, accurate answers on every dimension of the rubric, with at most a single small gap (typically a missing route, an incomplete schema column, or a low-confidence impact resolution). It means an AI assistant working on this project with cix in the loop has the information it needs to behave well.

A 10/15 means cix produced grounded answers on most dimensions, with some areas requiring fallback to file reading. The assistant still gets value, but more of the work happens through other tools.

A 4/15 means the index is largely empty for this project — cix is operating below its useful threshold, and the assistant should treat its output as suggestive rather than authoritative. This is the honest signal that says: "use a different approach for this project."

The fact that the system gives you these signals — rather than producing 14/15-shaped output everywhere regardless of actual coverage — is the reason the high scores are trustworthy.

What we are working to improve

The benchmark surfaced specific, addressable gaps:

  • Lua parser support. Would lift Kong-class projects from 1–4 to roughly 11–13, based on the same rubric.
  • Dynamic-handler extraction. Would lift Loki and rclone from 9–11 toward 13.
  • Same-name symbol disambiguation in impact analysis. Would lift several projects by a point.
  • Test-fixture route filtering. Would clean up the route view on monorepos with extensive test infrastructure.

None of these are conceptual blockers. They are work, scoped and prioritized in the roadmap.

What we are not promising

  • A 14/15 score on every project. The benchmark shows the score depends on the project's shape and language. Run cix on yours before assuming.
  • Coverage of every language. The set of supported parsers is finite today; we are honest about which.
  • Fully automated indexing of dynamic patterns. Some surface remains static-analysis-invisible by nature.

How to use this page

If your project is a standard backend in Python, Java, C#, PHP, JavaScript, or TypeScript, the benchmark suggests cix will deliver high value with low setup cost. If your project is in Lua, primarily uses dynamic dispatch, or sits at an unusual boundary (custom routing layer, exotic ORM), use the benchmark scores as a calibration: cix will help, but partially, and you should run a small evaluation before committing.

Always-applicable guidance: read the Limitations page before adopting. We say what doesn't work as clearly as we say what does.

Related