← All docs

docs / for-evaluators.md

For evaluators

You are a technical decision-maker — staff engineer, principal, eng lead, or CTO at a smaller org — and you've been asked to assess whether cix is worth adopting on a real codebase. You want concrete evidence, a defensible evaluation method, and an honest read of the limits before committing budget or attention.

This page is for that decision. It assumes you'd rather measure than be sold to.

The problem cix is trying to solve for you

Your team uses AI coding assistants. They produce code that compiles, passes the local tests, and looks right in review. They also produce code that quietly duplicates existing helpers, drops files in the wrong folders, references fabricated schema, and burns tokens reading files that targeted queries could have answered.

Each individual mistake is small. The cumulative cost is real — in PR review time, in convention drift, in tokens, and occasionally in incidents (a missed created_at column in production, a credential leak in a public-served directory).

cix's claim is narrow and specific: an AI assistant working inside a project with cix indexed and configured exhibits measurably more grounded behavior than the same assistant without it. The right way to evaluate that claim is to measure it on your own code.

Why this is better than what you have

Briefly, since Comparison covers it in full:

  • vs. grep/ripgrep: the assistant's grep usage gets replaced by structured queries that return symbols, not text matches. Keep grep for human use.
  • vs. an LSP: LSPs run inside your editor; AI assistants don't see them. cix is the structured-context layer for the assistant.
  • vs. RAG/vector search: cix returns symbols and structure, not similarity-ranked chunks. Less noise, structured metadata, and answers RAG can't give (impact, schema, route).
  • vs. CLAUDE.md/AGENTS.md instruction files: complementary. Static docs describe; cix grounds and verifies.
  • vs. fine-tuning: cheaper, transparent, portable across assistants, doesn't age out with every refactor.
  • vs. doing nothing: doing nothing has a real cost paid in small increments. cix is the smallest intervention that addresses the structural part.

A one-week evaluation plan

This is what we recommend before any adoption decision. It is deliberately small.

Day 1 — install and orient

  • Install on one engineer's machine (a few minutes).
  • Initialize on one real project of yours, ideally one with a database, real routes, and at least a few hundred files.
  • Read the inferred conventions. Read the orientation output. Confirm the system identified the stack correctly and parsed cleanly.

Days 2–3 — run a representative task

  • Pick a task that would normally take 20–60 minutes (an endpoint addition, a small bug fix, a small refactor).
  • Have the assistant do it once with cix in the loop and once without — same prompt, fresh sessions.
  • Record file reads, tokens, and whether the output is correct on the first attempt.

Day 4 — run a refactor or cleanup

  • Pick a change with non-trivial blast radius — a rename across many files, or a pre-release cleanup pass.
  • Compare coverage and confidence between the cix run and a baseline.

Day 5 — read and decide

By Friday you have your own evidence, not a vendor's.

What to measure

MetricWhat it tells you
Files read per taskA grounded session reads dramatically fewer files.
Tokens consumedRoughly tracks files read; matters at team scale.
First-attempt correctnessDid placement, naming, and schema references match what the team would have done?
Refactor coverageDid the impact analysis catch every site, or were stragglers found later?
Time-to-working-resultWall-clock time for a representative task end-to-end.

For reference, the pre-release cleanup case study measured roughly half the tool calls and a third of the tokens for the same task on a Laravel + Vue project. Your numbers will vary; the shape of the result usually doesn't.

What success looks like

Six to eight weeks after adoption, on a team that benefits, the visible signals are:

  • Reviewers spending less time catching mechanical issues — wrong-folder placement, duplicated helpers, fabricated columns.
  • Fewer incidents in the category cix prevents: schema mismatches, dead code in production, structural drift.
  • Lower token spend on coding tasks across the team.
  • New contributors productive faster on existing projects.

If after a fair trial you don't see those signals, cix is not the right fit for your situation. The honest answer matters more than the adoption.

When NOT to use cix

Skip or defer if:

  • Your codebase is primarily in an unsupported language (Lua is the canonical case — see Language support for the full tier list). The system is honest that the value is small here today.
  • Your project relies heavily on dynamic registration of routes, handlers, or tables. Coverage is partial; the index will be useful but not comprehensive.
  • You're greenfield with no established conventions yet. cix can still help, but the immediate payoff is smaller than on a mature codebase.
  • Your team isn't using AI coding assistants. This is obvious but worth saying. cix is infrastructure for assistants. Without one, the value isn't there.
  • You can't run a measurement. If you can't carve out a few hours over a week to compare side-by-side, you can't fairly evaluate any tool in this category.

Where to go next