VisionFlux: a roadmap for local visual understanding
Our second product takes shape. What local vision-language models can do today — and where we are betting they go next.
Today's local VLMs are real
A year ago, "vision-language model on a phone" was a research thread. Today it is a product surface. Quantized 2–4B parameter VLMs (Qwen-VL, MiniCPM-V, Idefics, MLX-VLM ports of Llama-3.2-Vision) answer real questions about real images at interactive latency on flagship NPUs.
The capability frontier has crossed the consumer-product line: OCR with reasoning, document Q&A, scene description, screen understanding and visual grounding now all fit in a phone-sized memory budget when carefully quantized.
What VisionFlux will ship first
- Document Q&A. Point at a page, ask anything. Layout-aware, table-aware, bilingual.
- Translate the world. Menus, signs, packaging — overlay translation in place, on-device, zero round-trip to a server.
- Accessibility narration. Describe a scene to a low-vision user with adjustable verbosity, controllable refresh rate, and a privacy guarantee no cloud-narration product can match.
- Receipt and form ingestion. Photograph an expense, get structured fields, file it locally — no third-party SaaS.
Where we are going
- Persistent, opt-in local memory of the things you have looked at, indexed by a small embedding model so you can ask "where did I see that thing?".
- Scene mode — a continuously-running narration of a changing environment, throttled by motion and battery.
- Multimodal notes — capture a thought as a photo + voice clip and let the model file it under the right project, with the right tags.
- AR overlays when the platform allows — translation, accessibility, contextual help, all rendered locally.
What we are not building
We are not building a cloud-vision API competitor. The frontier of "what's the most you can extract from a single image with unlimited compute" is not where on-device wins. We win at "what's the most you can get from the camera in your pocket, right now, with no network."
That is a different product, and it is the one we want to ship.
Want updates like this in your inbox?
No newsletter platform. No tracking. We send a single email per launch.
Subscribe