CodeFlux: an offline pair programmer that respects your repo
A 7B-class code model running locally with project-aware retrieval. Why a smaller model with the right context beats a larger one without.
The thesis
A 7B-class code model, run locally and given the right slice of your repository as context, is more useful than a frontier-tier cloud model with no idea what your codebase looks like.
This is not a claim about benchmark scores. It is a claim about completions you actually accept. CodeFlux is built on top of that thesis.
Architecture
Three on-device components, cooperating:
- The model. A quantized 7B code model (we evaluate Qwen-2.5-Coder, DeepSeek-Coder-V2-Lite, and StarCoder2 derivatives; the fastest 4-bit variant we can ship for each platform). Inference via MLX on Apple silicon, llama.cpp on PC and Linux, ONNX Runtime on Windows ARM.
- A repo index. A local vector index of your repository, refreshed on file save. We embed at three granularities: file, symbol, and line range. The index is a single SQLite file inside
.codeflux/in your repo. - A retrieval layer. When you type, we look at the surrounding code, the file imports, and the symbols nearby; we pull the top-K most relevant other files; we splice them into the prompt; we go.
Why local for code
- Your repo is the most sensitive thing you own. The only model that should ever see all of it is one running inside your machine.
- Your code is also the most idiomatic input the model will ever get. Cloud models normalize toward GitHub-average style; a local model conditioned on your repo writes code that fits.
- Latency on inline completion matters at the level of a few hundred milliseconds. A round-trip to a cloud is a tax you pay on every keystroke.
What CodeFlux does today
- Inline completion with cancellation: as you keep typing, in-flight inferences are cancelled and restarted. The model never blocks the editor.
- Repo-grounded chat. Ask "where is the auth middleware?" and CodeFlux retrieves the relevant files, then answers — citing line numbers.
- Edits on selection. Highlight a function, ask for a refactor, and the diff appears as a normal review experience. You apply, dismiss, or edit.
- Tests on demand. "Generate tests for this file" produces a candidate test file in the project's existing test framework, with imports resolved correctly.
What CodeFlux does not do
- No autonomous agents. No long-running coding agent that takes actions in your repo without your review. We will reconsider when reliability earns it.
- No telemetry on what you ask or what we suggest. None. The hardest temptation in a developer tool, and the one we are most committed to refusing.
- No reliance on cloud for any feature. A network outage degrades nothing.
Honest limitations
- A 7B code model will not match a frontier 200B+ model on the hardest reasoning-heavy refactors. We give you a one-tap escape hatch to send a redacted snippet to a cloud model of your choice. Your call, your provider, your data redaction settings.
- Indexing very large monorepos is slow on first run. We keep a per-machine cache and ship a
.codeflux/git-ignored layout to make repeat opens instant.
CodeFlux is not the model with the highest leaderboard score. It is the model with the highest accepted-completion rate on your code, because it has actually read your code.
Want updates like this in your inbox?
No newsletter platform. No tracking. We send a single email per launch.
Subscribe