Journal
EngineeringPublished 2026-01-18·8 min read

CodeFlux: an offline pair programmer that respects your repo

A 7B-class code model running locally with project-aware retrieval. Why a smaller model with the right context beats a larger one without.

The thesis

A 7B-class code model, run locally and given the right slice of your repository as context, is more useful than a frontier-tier cloud model with no idea what your codebase looks like.

This is not a claim about benchmark scores. It is a claim about completions you actually accept. CodeFlux is built on top of that thesis.

Architecture

Three on-device components, cooperating:

  1. The model. A quantized 7B code model (we evaluate Qwen-2.5-Coder, DeepSeek-Coder-V2-Lite, and StarCoder2 derivatives; the fastest 4-bit variant we can ship for each platform). Inference via MLX on Apple silicon, llama.cpp on PC and Linux, ONNX Runtime on Windows ARM.
  2. A repo index. A local vector index of your repository, refreshed on file save. We embed at three granularities: file, symbol, and line range. The index is a single SQLite file inside .codeflux/ in your repo.
  3. A retrieval layer. When you type, we look at the surrounding code, the file imports, and the symbols nearby; we pull the top-K most relevant other files; we splice them into the prompt; we go.

Why local for code

  • Your repo is the most sensitive thing you own. The only model that should ever see all of it is one running inside your machine.
  • Your code is also the most idiomatic input the model will ever get. Cloud models normalize toward GitHub-average style; a local model conditioned on your repo writes code that fits.
  • Latency on inline completion matters at the level of a few hundred milliseconds. A round-trip to a cloud is a tax you pay on every keystroke.

What CodeFlux does today

  • Inline completion with cancellation: as you keep typing, in-flight inferences are cancelled and restarted. The model never blocks the editor.
  • Repo-grounded chat. Ask "where is the auth middleware?" and CodeFlux retrieves the relevant files, then answers — citing line numbers.
  • Edits on selection. Highlight a function, ask for a refactor, and the diff appears as a normal review experience. You apply, dismiss, or edit.
  • Tests on demand. "Generate tests for this file" produces a candidate test file in the project's existing test framework, with imports resolved correctly.

What CodeFlux does not do

  • No autonomous agents. No long-running coding agent that takes actions in your repo without your review. We will reconsider when reliability earns it.
  • No telemetry on what you ask or what we suggest. None. The hardest temptation in a developer tool, and the one we are most committed to refusing.
  • No reliance on cloud for any feature. A network outage degrades nothing.

Honest limitations

  • A 7B code model will not match a frontier 200B+ model on the hardest reasoning-heavy refactors. We give you a one-tap escape hatch to send a redacted snippet to a cloud model of your choice. Your call, your provider, your data redaction settings.
  • Indexing very large monorepos is slow on first run. We keep a per-machine cache and ship a .codeflux/ git-ignored layout to make repeat opens instant.

CodeFlux is not the model with the highest leaderboard score. It is the model with the highest accepted-completion rate on your code, because it has actually read your code.

Want updates like this in your inbox?

No newsletter platform. No tracking. We send a single email per launch.

Subscribe