Pulse observability

p50/p95/p99 + per-route metrics.

7 mineasy

You can't fix what you can't see. Pulse is Grit's observability dashboard — p50/p95/p99 latency, RPS per route, per-handler timings, slow-query log. No Datadog bill, no separate infra. Mounted at /pulse/ui in every Grit API.

What you see

Live request rate (last 1 min / hour / day)
Latency percentiles per route — p50, p95, p99
Error rates per route (4xx / 5xx breakdown)
Slow query log (queries >100ms)
System metrics — Go runtime, goroutine count, GC stats

Why percentiles, not averages

An average hides outliers. If your endpoint serves 99 fast requests (10ms each) and 1 slow one (2,000ms), the average is 30ms — looks fine. But for that 1% of users, the experience is unusable. p99 catches it.

Average     30ms     ← looks fine
p50         10ms     ← typical
p95         12ms     ← still fine
p99         2,000ms  ← someone is having a bad time

The four numbers to watch

p50 — typical experience. Should be <100ms for most CRUD endpoints.
p95 — almost everyone. <500ms is a healthy ceiling.
p99 — the worst 1%. <2s; over that, something is wrong.
Error rate — should be 0% for healthy endpoints. A spike means a regression.

The slow query log

Queries that take >100ms get logged. Grit's Pulse dashboard groups them by SQL signature so you see which query is slow, not just that something is.

Slow queries (last hour):
1. SELECT * FROM orders WHERE customer_id = ?    avg 350ms, 12 calls   ← add an index
2. SELECT * FROM users JOIN ...                  avg 800ms, 1 call     ← spike, investigate
3. SELECT count(*) FROM activities               avg 1200ms, 4 calls   ← needs covering index

The most common fix is an index. Add gorm:"index" to the field, run grit migrate, problem solved.

SQLite vs Postgres storage: Pulse can store its metrics in SQLite (zero infra) or Postgres (shared across replicas). Default is SQLite. Set PULSE_STORAGE=postgres when you scale out.

What Pulse doesn't do

Pulse is in-process — same binary as your API. That has trade-offs:

No cross-service tracing. If you have multiple Grit APIs talking to each other, you can't follow a request across them. Use OpenTelemetry for that.
Metrics are per-replica. If you run 3 API containers, each has its own Pulse view. The Postgres backend can merge if you point all 3 at the same DB.
Last 24 hours by default. Older data ages out. For long-term retention, export to Prometheus / Datadog.

For most products, single-binary Pulse is enough. You can always graduate.

Prometheus export — when you grow up

Pulse exposes /pulse/metrics in Prometheus format. Add Prometheus + Grafana, scrape this endpoint, and you have proper long-term metrics. Use the Pulse UI for day-to-day; Prometheus for historical analysis.

Quick check

Pulse shows your `/api/orders` endpoint at p50=15ms, p95=50ms, p99=1.8s. What's the most likely cause?

Try it

Load-test your API to populate Pulse:

Make sure Pulse is on (PULSE_ENABLED=true).
Run ab or wrk for 30 seconds against /api/health:
Terminal
```
$ab -n 1000 -c 10 http://localhost:8080/api/health
```
Open /pulse/ui. You should see the spike + the p50/p95/p99 numbers.
Screenshot the route view and paste it in notes.md.

What's next

Last lesson of the chapter — the tamper-evident audit log. Every mutation is hashed into a chain so retroactive tampering breaks verification. The compliance team will love you.

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub

Previous lesson Next lesson