About — VectorLab

What we do differently

Fix first

Audit before building

Most AI implementations underperform. We benchmark accuracy, latency, and cost-per-inference before writing any new code. Fix what exists before adding complexity.

Tiny models

Compress for the edge

Quantization (INT4/INT8), pruning, and distillation. We deploy models where they need to run—on-device, at the edge, without cloud dependency.

Edge-native

Cloudflare-first architecture

Inference on Cloudflare Workers. State in D1 and R2. Vector search with SQLite and sql-vec. No centralized infrastructure to manage.

Measurable

Before/after metrics

Every engagement ships with comparison dashboards. Latency percentiles, accuracy scores, cost-per-inference, and reliability metrics—documented and verifiable.

Productized

Fixed-scope services

Clear deliverables and pricing. No open-ended consulting engagements. You know what you get and what it costs before we start.

Low risk

Test-run model

Start with 10–20 hours a week at $250/hr. See results in the first month. Scale up, move to a retainer, or walk away.

Tech stack

Models & inference

Gemini Flash
GLM 4.7 Flash on Cerebras
Claude SDK
ONNX Runtime
TensorFlow Lite

Infrastructure

Cloudflare Workers
Cloudflare Pages
Cloudflare D1 / R2
SQLite + sql-vec
Postgres

Frameworks & observability

Pydantic AI
FastAPI
Pydantic Logfire
TinyML tooling
Model quantization (INT4/INT8)

About VectorLab