documentation
Everything you need,
nothing leaves your infra.
CostObs is self-hostable AI/LLM cost observability with per-request attribution. These docs cover the five-minute setup, the three SDKs, and the operational pieces. The deeper reference lives in the repo's docs/ directory.
Quickstart
One command brings up ClickHouse, Postgres, migrations, the ingest API, the alert evaluator, the (idle until opted-in) billing sync, and the dashboard:
git clone https://github.com/officialasishkumar/costobs cd costobs cp .env.example .env docker compose up
Open http://localhost:3000. A default org
and a dev API key (costobs_dev_secret_key)
are seeded on boot — the dashboard's first-run screen walks you through
sending your first event.
Demo data
No provider key handy? Populate every chart with 30k synthetic events (45 days, weekday seasonality, simulated provider bills for the reconciliation page):
make demo-data
Python SDK
pip install costobs
import costobs from openai import OpenAI costobs.configure(ingest_url="http://localhost:8080", api_key="costobs_dev_secret_key") client = costobs.wrap(OpenAI(), team="payments", environment="prod") with costobs.request_context(customer_id="cust-42", trace_id="trace-abc"): client.chat.completions.create( model="gpt-4o", messages=[...], feature="summarize", # per-call metadata prompt_version="v3", ) costobs.flush() # drain before short-lived scripts exit
Supported clients
OpenAI()/AsyncOpenAI()— chat, responses, embeddings, streaming (usage injected viainclude_usage)- OpenAI-compatible via
base_url: xAI/Grok, Together, Fireworks, OpenRouter, Groq, DeepSeek, Mistral, Azure — the real provider is stamped automatically anthropic.Anthropic()— incl. extended-thinking token accounting in streamsgoogle-genai— Geminiboto3.client("bedrock-runtime")— Converse APIcostobs.wrap(litellm)— pass the module; provider derived per call from the routing prefix
Decorator
@costobs.trace(feature="search", team="discovery") def handle_query(q): ... # works on async functions too
TypeScript SDK
npm install @costobs/sdk
import OpenAI from "openai"; import { wrap, requestContext, flush } from "@costobs/sdk"; const client = wrap(new OpenAI(), { team: "payments", environment: "prod" }); await requestContext({ customer_id: "cust-42" }, async () => { await client.chat.completions.create({ model: "gpt-4o", messages: [...], costobs: { feature: "summarize", promptVersion: "v3" }, }); }); await flush();
Vercel AI SDK
import { trackGenerateText, trackStreamText } from "@costobs/sdk"; const result = await trackGenerateText( generateText({ model: openai("gpt-4o"), prompt }), { feature: "summarize", customer_id: "cust-42" }, ); const stream = trackStreamText(streamText({ model, prompt }), { feature: "chat" });
Go SDK
Go takes the transport seam: wrap any provider SDK built on
net/http with an observing
RoundTripper. Streams are teed; unknown hosts
pass through untouched.
co := costobs.New(costobs.Config{ IngestURL: "http://localhost:8080", APIKey: "costobs_dev_secret_key", Team: "payments", Environment: "prod", }) defer co.Close() httpClient := &http.Client{Transport: co.Transport(nil)} oai := openai.NewClient(option.WithHTTPClient(httpClient)) ctx = costobs.WithMetadata(ctx, costobs.Metadata{ CustomerID: "cust-42", Feature: "chat", Tags: map[string]string{"pr": "1234"}, }) resp, err := oai.Chat.Completions.New(ctx, ...)
Metadata & attribution
Nine first-class fields map to event columns; anything else becomes a tag. Three layers, later wins:
| layer | python | typescript | go |
|---|---|---|---|
| wrap-time defaults | wrap(client, team=…) | wrap(client, {team}) | Config{Team: …} |
| request scope | request_context() / @trace | requestContext() | WithMetadata(ctx, …) |
| per call | feature=, prompt_version= | costobs: {…} | — |
Fields: customer_id,
user_id, trace_id,
feature, team,
service, environment,
prompt_key, prompt_version
+ free-form tags.
record() — anything else
For providers without an adapter (Deepgram, ElevenLabs, gRPC APIs, batch jobs), record the call manually. It prices locally, merges the active context, and ships async like everything else:
costobs.record( provider="elevenlabs", model="eleven_multilingual_v2", operation="audio", characters=len(text), feature="narration", customer_id="cust-7", ) costobs.record( provider="deepgram", model="nova-3", operation="audio", audio_seconds=resp.metadata.duration, )
Pricing file
One versioned YAML (shared/pricing/) is the
source of truth for SDKs and backend alike — per-token, cached-token,
reasoning, tool, image-tier, audio-second, and per-character rates, all
computed with exact decimal math. Edit it, bump
version, run
make sync-pricing — no SDK release needed.
Unknown models are tracked at $0 until you add a rate, so nothing is
silently lost.
Billing sync & reconciliation
SDK costs are estimates. billsyncd pulls
actual daily bills from provider admin APIs into
billed_daily; the Reconcile page shows drift
per provider and untracked spend — billed cost no SDK
event accounts for.
opt-in by design
billsyncd is the only CostObs component that makes outbound calls, and only when you set a key. Without keys it idles, fully offline.# .env
COSTOBS_BILLING_OPENAI_ADMIN_KEY=sk-admin-...
COSTOBS_BILLING_ANTHROPIC_ADMIN_KEY=sk-ant-admin-...
Air-gapped or unsupported provider? Import invoices straight into the table:
INSERT INTO billed_daily (org_id, provider, date, billed_usd, source)
VALUES ('default', 'gemini', '2026-06-01', 12.34, 'import');
Cost per PR / engineer / experiment
Tag calls in CI, then slice by any tag on the Breakdown page
(/breakdown?tag=pr):
# .github/workflows/preview.yml
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
client = costobs.wrap( OpenAI(), environment="preview", pr=os.environ.get("PR_NUMBER", ""), engineer=os.environ.get("PR_AUTHOR", ""), )
Alerts
Three rule kinds, evaluated by alertd on an
interval, delivered to Slack or any webhook with per-rule cooldowns:
| kind | fires when | config |
|---|---|---|
| daily_threshold | today's spend > limit | {"threshold_usd": 50} |
| spike | today > factor × baseline | {"factor": 2, "baseline": "rolling_7d"} |
| monthly_budget | month-to-date > budget | {"budget_usd": 1000} |
Rules are rows in Postgres (alert_rules +
alert_targets), scopeable to any combination
of provider, model, feature, team, customer, user, or environment.
Self-hosting
Mode A — Docker Compose
The quickstart above. Single machine, bundled single-node ClickHouse and Postgres, data in named volumes. Good to ~10M events on a laptop.
Mode B — Kubernetes (Helm)
helm install costobs deploy/helm/costobs \ -f deploy/k8s-examples/values-small.yaml \ --namespace costobs --create-namespace
Scaling, TLS/ingress, OIDC, and bring-your-own external Postgres +
ClickHouse are covered in
docs/self-hosting.md.
Every service exposes /healthz and Prometheus
/metrics.
Architecture
YOUR APP ──(real call, sync)──────────────► LLM PROVIDER
│ costobs SDK: local cost calc, build event (NO content)
▼
background queue ──(async batched POST)──► ingest (Go :8080)
│ auth + dedup
┌─────────────────────┴─────────────┐
▼ ▼
ClickHouse (events + rollups) Postgres (orgs, keys, rules)
│ │
▼ │
dashboard (Next.js :3000) ◄─────────────────────┘
alertd (Go :8081) → Slack / webhook
billsyncd (Go :8082, opt-in) → billed_daily
ClickHouse materialized views maintain rollups
(cost_daily,
cost_attr_hourly,
prompt_version_daily) so every dashboard page
except the raw-request drill-down stays sub-second at 10M+ events.
Privacy guarantees
- No prompt/response content — the wire schema has no field for it; it cannot be stored accidentally.
- No phone home — no telemetry, no license checks, no cloud control plane.
- No proxy — provider calls never route through CostObs; observation is off the hot path.
- One audited exception — billsyncd calls provider billing APIs, only when you set its keys.