Pulse observability

p50/p95/p99 + per-route metrics.

7 mineasy

You can't fix what you can't see. Pulse is Grit's observability dashboard — p50/p95/p99 latency, RPS per route, per-handler timings, slow-query log. No Datadog bill, no separate infra. Mounted at /pulse/ui in every Grit API.

What you see

  • Live request rate (last 1 min / hour / day)
  • Latency percentiles per route — p50, p95, p99
  • Error rates per route (4xx / 5xx breakdown)
  • Slow query log (queries >100ms)
  • System metrics — Go runtime, goroutine count, GC stats

Why percentiles, not averages

An average hides outliers. If your endpoint serves 99 fast requests (10ms each) and 1 slow one (2,000ms), the average is 30ms — looks fine. But for that 1% of users, the experience is unusable. p99 catches it.

Average 30ms ← looks fine
p50 10ms ← typical
p95 12ms ← still fine
p99 2,000ms ← someone is having a bad time

The four numbers to watch

  • p50 — typical experience. Should be <100ms for most CRUD endpoints.
  • p95 — almost everyone. <500ms is a healthy ceiling.
  • p99 — the worst 1%. <2s; over that, something is wrong.
  • Error rate — should be 0% for healthy endpoints. A spike means a regression.

The slow query log

Queries that take >100ms get logged. Grit's Pulse dashboard groups them by SQL signature so you see which query is slow, not just that something is.

Slow queries (last hour):
1. SELECT * FROM orders WHERE customer_id = ? avg 350ms, 12 calls ← add an index
2. SELECT * FROM users JOIN ... avg 800ms, 1 call ← spike, investigate
3. SELECT count(*) FROM activities avg 1200ms, 4 calls ← needs covering index

The most common fix is an index. Add gorm:"index" to the field, run grit migrate, problem solved.

SQLite vs Postgres storage: Pulse can store its metrics in SQLite (zero infra) or Postgres (shared across replicas). Default is SQLite. Set PULSE_STORAGE=postgres when you scale out.

What Pulse doesn't do

Pulse is in-process — same binary as your API. That has trade-offs:

  • No cross-service tracing. If you have multiple Grit APIs talking to each other, you can't follow a request across them. Use OpenTelemetry for that.
  • Metrics are per-replica. If you run 3 API containers, each has its own Pulse view. The Postgres backend can merge if you point all 3 at the same DB.
  • Last 24 hours by default. Older data ages out. For long-term retention, export to Prometheus / Datadog.

For most products, single-binary Pulse is enough. You can always graduate.

Prometheus export — when you grow up

Pulse exposes /pulse/metrics in Prometheus format. Add Prometheus + Grafana, scrape this endpoint, and you have proper long-term metrics. Use the Pulse UI for day-to-day; Prometheus for historical analysis.

Quick check

Pulse shows your `/api/orders` endpoint at p50=15ms, p95=50ms, p99=1.8s. What's the most likely cause?

Try it

Load-test your API to populate Pulse:

  1. Make sure Pulse is on (PULSE_ENABLED=true).
  2. Run ab or wrk for 30 seconds against /api/health:
    Terminal
    $ab -n 1000 -c 10 http://localhost:8080/api/health
  3. Open /pulse/ui. You should see the spike + the p50/p95/p99 numbers.
  4. Screenshot the route view and paste it in notes.md.

What's next

Last lesson of the chapter — the tamper-evident audit log. Every mutation is hashed into a chain so retroactive tampering breaks verification. The compliance team will love you.

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub