Pulse observability
p50/p95/p99 + per-route metrics.
You can't fix what you can't see. Pulse is Grit's observability dashboard â p50/p95/p99 latency, RPS per route, per-handler timings, slow-query log. No Datadog bill, no separate infra. Mounted at /pulse/ui in every Grit API.
What you see
- Live request rate (last 1 min / hour / day)
- Latency percentiles per route â p50, p95, p99
- Error rates per route (4xx / 5xx breakdown)
- Slow query log (queries >100ms)
- System metrics â Go runtime, goroutine count, GC stats
Why percentiles, not averages
An average hides outliers. If your endpoint serves 99 fast requests (10ms each) and 1 slow one (2,000ms), the average is 30ms â looks fine. But for that 1% of users, the experience is unusable. p99 catches it.
Average 30ms â looks finep50 10ms â typicalp95 12ms â still finep99 2,000ms â someone is having a bad time
The four numbers to watch
- p50 â typical experience. Should be <100ms for most CRUD endpoints.
- p95 â almost everyone. <500ms is a healthy ceiling.
- p99 â the worst 1%. <2s; over that, something is wrong.
- Error rate â should be 0% for healthy endpoints. A spike means a regression.
The slow query log
Queries that take >100ms get logged. Grit's Pulse dashboard groups them by SQL signature so you see which query is slow, not just that something is.
Slow queries (last hour):1. SELECT * FROM orders WHERE customer_id = ? avg 350ms, 12 calls â add an index2. SELECT * FROM users JOIN ... avg 800ms, 1 call â spike, investigate3. SELECT count(*) FROM activities avg 1200ms, 4 calls â needs covering index
The most common fix is an index. Add gorm:"index" to the field, run grit migrate, problem solved.
PULSE_STORAGE=postgres when you scale out.What Pulse doesn't do
Pulse is in-process â same binary as your API. That has trade-offs:
- No cross-service tracing. If you have multiple Grit APIs talking to each other, you can't follow a request across them. Use OpenTelemetry for that.
- Metrics are per-replica. If you run 3 API containers, each has its own Pulse view. The Postgres backend can merge if you point all 3 at the same DB.
- Last 24 hours by default. Older data ages out. For long-term retention, export to Prometheus / Datadog.
For most products, single-binary Pulse is enough. You can always graduate.
Prometheus export â when you grow up
Pulse exposes /pulse/metrics in Prometheus format. Add Prometheus + Grafana, scrape this endpoint, and you have proper long-term metrics. Use the Pulse UI for day-to-day; Prometheus for historical analysis.
Quick check
Try it
Load-test your API to populate Pulse:
- Make sure Pulse is on (
PULSE_ENABLED=true). - Run
aborwrkfor 30 seconds against/api/health:Terminal$ab -n 1000 -c 10 http://localhost:8080/api/health - Open
/pulse/ui. You should see the spike + the p50/p95/p99 numbers. - Screenshot the route view and paste it in
notes.md.
What's next
Last lesson of the chapter â the tamper-evident audit log. Every mutation is hashed into a chain so retroactive tampering breaks verification. The compliance team will love you.
Spot a typo? Have an idea?
Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled â suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.
Suggest an improvement on GitHub