documentation

Everything you need,
nothing leaves your infra.

CostObs is self-hostable AI/LLM cost observability with per-request attribution. These docs cover the five-minute setup, the three SDKs, and the operational pieces. The deeper reference lives in the repo's docs/ directory.

Quickstart

One command brings up ClickHouse, Postgres, migrations, the ingest API, the alert evaluator, the (idle until opted-in) billing sync, and the dashboard:

git clone https://github.com/officialasishkumar/costobs
cd costobs
cp .env.example .env
docker compose up

Open http://localhost:3000. A default org and a dev API key (costobs_dev_secret_key) are seeded on boot — the dashboard's first-run screen walks you through sending your first event.

Demo data

No provider key handy? Populate every chart with 30k synthetic events (45 days, weekday seasonality, simulated provider bills for the reconciliation page):

make demo-data

Python SDK

pip install costobs

import costobs
from openai import OpenAI

costobs.configure(ingest_url="http://localhost:8080", api_key="costobs_dev_secret_key")

client = costobs.wrap(OpenAI(), team="payments", environment="prod")

with costobs.request_context(customer_id="cust-42", trace_id="trace-abc"):
    client.chat.completions.create(
        model="gpt-4o",
        messages=[...],
        feature="summarize",        # per-call metadata
        prompt_version="v3",
    )

costobs.flush()   # drain before short-lived scripts exit

Supported clients

OpenAI() / AsyncOpenAI() — chat, responses, embeddings, streaming (usage injected via include_usage)
OpenAI-compatible via base_url: xAI/Grok, Together, Fireworks, OpenRouter, Groq, DeepSeek, Mistral, Azure — the real provider is stamped automatically
anthropic.Anthropic() — incl. extended-thinking token accounting in streams
google-genai — Gemini
boto3.client("bedrock-runtime") — Converse API
costobs.wrap(litellm) — pass the module; provider derived per call from the routing prefix

Decorator

@costobs.trace(feature="search", team="discovery")
def handle_query(q): ...   # works on async functions too

TypeScript SDK

npm install @costobs/sdk

import OpenAI from "openai";
import { wrap, requestContext, flush } from "@costobs/sdk";

const client = wrap(new OpenAI(), { team: "payments", environment: "prod" });

await requestContext({ customer_id: "cust-42" }, async () => {
  await client.chat.completions.create({
    model: "gpt-4o",
    messages: [...],
    costobs: { feature: "summarize", promptVersion: "v3" },
  });
});

await flush();

Vercel AI SDK

import { trackGenerateText, trackStreamText } from "@costobs/sdk";

const result = await trackGenerateText(
  generateText({ model: openai("gpt-4o"), prompt }),
  { feature: "summarize", customer_id: "cust-42" },
);

const stream = trackStreamText(streamText({ model, prompt }), { feature: "chat" });

Go SDK

Go takes the transport seam: wrap any provider SDK built on net/http with an observing RoundTripper. Streams are teed; unknown hosts pass through untouched.

co := costobs.New(costobs.Config{
    IngestURL: "http://localhost:8080", APIKey: "costobs_dev_secret_key",
    Team: "payments", Environment: "prod",
})
defer co.Close()

httpClient := &http.Client{Transport: co.Transport(nil)}
oai := openai.NewClient(option.WithHTTPClient(httpClient))

ctx = costobs.WithMetadata(ctx, costobs.Metadata{
    CustomerID: "cust-42", Feature: "chat",
    Tags: map[string]string{"pr": "1234"},
})
resp, err := oai.Chat.Completions.New(ctx, ...)

Metadata & attribution

Nine first-class fields map to event columns; anything else becomes a tag. Three layers, later wins:

layer	python	typescript	go
wrap-time defaults	wrap(client, team=…)	wrap(client, {team})	Config{Team: …}
request scope	request_context() / @trace	requestContext()	WithMetadata(ctx, …)
per call	feature=, prompt_version=	costobs: {…}	—

Fields: customer_id, user_id, trace_id, feature, team, service, environment, prompt_key, prompt_version + free-form tags.

record() — anything else

For providers without an adapter (Deepgram, ElevenLabs, gRPC APIs, batch jobs), record the call manually. It prices locally, merges the active context, and ships async like everything else:

costobs.record(
    provider="elevenlabs", model="eleven_multilingual_v2",
    operation="audio", characters=len(text),
    feature="narration", customer_id="cust-7",
)

costobs.record(
    provider="deepgram", model="nova-3",
    operation="audio", audio_seconds=resp.metadata.duration,
)

Pricing file

One versioned YAML (shared/pricing/) is the source of truth for SDKs and backend alike — per-token, cached-token, reasoning, tool, image-tier, audio-second, and per-character rates, all computed with exact decimal math. Edit it, bump version, run make sync-pricing — no SDK release needed. Unknown models are tracked at $0 until you add a rate, so nothing is silently lost.

Billing sync & reconciliation

SDK costs are estimates. billsyncd pulls actual daily bills from provider admin APIs into billed_daily; the Reconcile page shows drift per provider and untracked spend — billed cost no SDK event accounts for.

opt-in by design

billsyncd is the only CostObs component that makes outbound calls, and only when you set a key. Without keys it idles, fully offline.

# .env
COSTOBS_BILLING_OPENAI_ADMIN_KEY=sk-admin-...
COSTOBS_BILLING_ANTHROPIC_ADMIN_KEY=sk-ant-admin-...

Air-gapped or unsupported provider? Import invoices straight into the table:

INSERT INTO billed_daily (org_id, provider, date, billed_usd, source)
VALUES ('default', 'gemini', '2026-06-01', 12.34, 'import');

Cost per PR / engineer / experiment

Tag calls in CI, then slice by any tag on the Breakdown page (/breakdown?tag=pr):

# .github/workflows/preview.yml
env:
  PR_NUMBER: ${{ github.event.pull_request.number }}
  PR_AUTHOR: ${{ github.event.pull_request.user.login }}

client = costobs.wrap(
    OpenAI(), environment="preview",
    pr=os.environ.get("PR_NUMBER", ""),
    engineer=os.environ.get("PR_AUTHOR", ""),
)

Alerts

Three rule kinds, evaluated by alertd on an interval, delivered to Slack or any webhook with per-rule cooldowns:

kind	fires when	config
daily_threshold	today's spend > limit	{"threshold_usd": 50}
spike	today > factor × baseline	{"factor": 2, "baseline": "rolling_7d"}
monthly_budget	month-to-date > budget	{"budget_usd": 1000}

Rules are rows in Postgres (alert_rules + alert_targets), scopeable to any combination of provider, model, feature, team, customer, user, or environment.

Self-hosting

Mode A — Docker Compose

The quickstart above. Single machine, bundled single-node ClickHouse and Postgres, data in named volumes. Good to ~10M events on a laptop.

Mode B — Kubernetes (Helm)

helm install costobs deploy/helm/costobs \
  -f deploy/k8s-examples/values-small.yaml \
  --namespace costobs --create-namespace

Scaling, TLS/ingress, OIDC, and bring-your-own external Postgres + ClickHouse are covered in docs/self-hosting.md. Every service exposes /healthz and Prometheus /metrics.

Architecture

YOUR APP ──(real call, sync)──────────────► LLM PROVIDER
   │  costobs SDK: local cost calc, build event (NO content)
   ▼
background queue ──(async batched POST)──► ingest (Go :8080)
                                              │ auth + dedup
                        ┌─────────────────────┴─────────────┐
                        ▼                                   ▼
            ClickHouse (events + rollups)        Postgres (orgs, keys, rules)
                        │                                   │
                        ▼                                   │
            dashboard (Next.js :3000) ◄─────────────────────┘
            alertd (Go :8081) → Slack / webhook
            billsyncd (Go :8082, opt-in) → billed_daily

ClickHouse materialized views maintain rollups (cost_daily, cost_attr_hourly, prompt_version_daily) so every dashboard page except the raw-request drill-down stays sub-second at 10M+ events.

Privacy guarantees

No prompt/response content — the wire schema has no field for it; it cannot be stored accidentally.
No phone home — no telemetry, no license checks, no cloud control plane.
No proxy — provider calls never route through CostObs; observation is off the hot path.
One audited exception — billsyncd calls provider billing APIs, only when you set its keys.

Everything you need,nothing leaves your infra.

Quickstart

Demo data

Python SDK

Supported clients

Decorator

TypeScript SDK

Vercel AI SDK

Go SDK

Metadata & attribution

record() — anything else

Pricing file

Billing sync & reconciliation

Cost per PR / engineer / experiment

Alerts

Self-hosting

Mode A — Docker Compose

Mode B — Kubernetes (Helm)

Architecture

Privacy guarantees

Everything you need,
nothing leaves your infra.