EngineeringPublished 2026-01-18·8 min read

CodeFlux: an offline pair programmer that respects your repo

A 7B-class code model running locally with project-aware retrieval. Why a smaller model with the right context beats a larger one without.

The thesis

A 7B-class code model, run locally and given the right slice of your repository as context, is more useful than a frontier-tier cloud model with no idea what your codebase looks like.

This is not a claim about benchmark scores. It is a claim about completions you actually accept. CodeFlux is built on top of that thesis.

Architecture

Three on-device components, cooperating:

The model. A quantized 7B code model (we evaluate Qwen-2.5-Coder, DeepSeek-Coder-V2-Lite, and StarCoder2 derivatives; the fastest 4-bit variant we can ship for each platform). Inference via MLX on Apple silicon, llama.cpp on PC and Linux, ONNX Runtime on Windows ARM.
A repo index. A local vector index of your repository, refreshed on file save. We embed at three granularities: file, symbol, and line range. The index is a single SQLite file inside .codeflux/ in your repo.
A retrieval layer. When you type, we look at the surrounding code, the file imports, and the symbols nearby; we pull the top-K most relevant other files; we splice them into the prompt; we go.

Why local for code

Your repo is the most sensitive thing you own. The only model that should ever see all of it is one running inside your machine.
Your code is also the most idiomatic input the model will ever get. Cloud models normalize toward GitHub-average style; a local model conditioned on your repo writes code that fits.
Latency on inline completion matters at the level of a few hundred milliseconds. A round-trip to a cloud is a tax you pay on every keystroke.

What CodeFlux does today

Inline completion with cancellation: as you keep typing, in-flight inferences are cancelled and restarted. The model never blocks the editor.
Repo-grounded chat. Ask "where is the auth middleware?" and CodeFlux retrieves the relevant files, then answers — citing line numbers.
Edits on selection. Highlight a function, ask for a refactor, and the diff appears as a normal review experience. You apply, dismiss, or edit.
Tests on demand. "Generate tests for this file" produces a candidate test file in the project's existing test framework, with imports resolved correctly.

What CodeFlux does not do

No autonomous agents. No long-running coding agent that takes actions in your repo without your review. We will reconsider when reliability earns it.
No telemetry on what you ask or what we suggest. None. The hardest temptation in a developer tool, and the one we are most committed to refusing.
No reliance on cloud for any feature. A network outage degrades nothing.

Honest limitations

A 7B code model will not match a frontier 200B+ model on the hardest reasoning-heavy refactors. We give you a one-tap escape hatch to send a redacted snippet to a cloud model of your choice. Your call, your provider, your data redaction settings.
Indexing very large monorepos is slow on first run. We keep a per-machine cache and ship a .codeflux/ git-ignored layout to make repeat opens instant.

CodeFlux is not the model with the highest leaderboard score. It is the model with the highest accepted-completion rate on your code, because it has actually read your code.

Want updates like this in your inbox?

No newsletter platform. No tracking. We send a single email per launch.

CodeFlux: an offline pair programmer that respects your repo

The thesis

Architecture

Why local for code

What CodeFlux does today

What CodeFlux does not do

Honest limitations

Continue reading

Why we bet the company on on-device AI

Designing Flux Engine: one runtime for every product

WhisperFlux preview: speech, speakers, summaries — all local