Reading the output
ns/op, B/op, allocs/op — what each tells you.
You wrote a benchmark and got numbers. Now make them mean something. This lesson is fluency with ns/op, B/op, and allocs/op — what each tells you, when each matters, and what red flags to look for.
ns/op — the headline
Nanoseconds per operation. The number you optimise for, most of the time.
- Under 100ns — fast. Probably a pure function with no syscalls.
- 100ns – 1µs (1000ns) — typical for string ops, simple JSON marshal, small loops.
- 1µs – 100µs — non-trivial work. Allocation, regex, complex parsing.
- 100µs – 1ms — usually some I/O or a big loop. If you didn't expect this, investigate.
- 1ms+ — likely the wrong target for microbenchmarking. Profile the broader system instead.
B/op — bytes per op
How many bytes of heap memory the operation allocates per iteration.
- 0 B/op — zero allocations. Often achievable for pure ops with stack-only data. The dream for hot paths.
- 16-64 B/op — typical for ops that build a small string, return a struct on the heap, or take a slice.
- 1KB+ /op — large; either you copied data or allocated a buffer. Usually a flag for optimisation.
allocs/op — number of heap allocations
Sometimes more important than bytes. Each allocation:
- Costs CPU to satisfy.
- Increases the work the GC has to do later.
- Causes a tiny pause when the GC runs.
For libraries used a million times per second (logging, request routing), driving allocs/op to 0 or 1 is a common target.
Reading a realistic output
BenchmarkPluralize-10 5102937 226 ns/op 64 B/op 2 allocs/opBenchmarkGoType-10 7891234 151 ns/op 48 B/op 1 allocs/opBenchmarkZodType-10 892145 1342 ns/op 256 B/op 8 allocs/op ← red flagBenchmarkGORMTag-10 2912034 411 ns/op 128 B/op 3 allocs/opBenchmarkInjectBefore-10 503456 2385 ns/op 512 B/op 12 allocs/op ← bigger flagBenchmarkParseInline-10 1056432 1147 ns/op 192 B/op 5 allocs/op
Two red flags worth noting:
BenchmarkZodTypeat 1.3µs with 8 allocs. Likely building intermediate strings.strings.Builderwould drop both numbers.BenchmarkInjectBeforeat 2.4µs with 12 allocs. 12 allocations PER OP is a lot — almost certainly a string search-and-replace with many intermediate values. A scanner or single buffer would help.
Spotting hidden allocations
// Allocates a new string every call (string is immutable in Go)func concat1(s string) string {return s + " · v1"}// Bench: 30 ns/op, 16 B/op, 1 allocs/op// Allocates a new slice on every append (cap growth)func concat2(words []string) string {out := ""for _, w := range words {out += w + " "}return out}// Bench (10 words): 290 ns/op, 192 B/op, 9 allocs/op ← yikes// strings.Builder reuses one growing bufferfunc concat3(words []string) string {var b strings.Builderfor _, w := range words {b.WriteString(w)b.WriteByte(' ')}return b.String()}// Bench (10 words): 110 ns/op, 32 B/op, 2 allocs/op ← much better
Same logical work, ~3x faster + 6x fewer allocations. The benchmark numbers told you exactly where the inefficiency was.
What "noise" looks like
# Run 5 times, results swinging 50%:BenchmarkX-10 5102937 226 ns/opBenchmarkX-10 4992014 289 ns/opBenchmarkX-10 5023412 232 ns/opBenchmarkX-10 4123912 312 ns/opBenchmarkX-10 5202345 228 ns/op
Noise. Your laptop is doing something else under load (browser, Spotlight, antivirus). Don't trust these numbers. Fix by:
- Closing background apps.
- Running on a quiet machine (CI, dedicated box).
- Using
-count=10+ benchstat (chapter 3). - Avoiding battery / thermal throttling.
var sink string // package-levelfunc BenchmarkPluralize(b *testing.B) {var s stringfor i := 0; i < b.N; i++ { s = Pluralize("cat") }sink = s // prevent dead-code elimination}
What ns/op alone can't tell you
Benchmarks measure ONE thing in isolation. They don't tell you:
- How often this function is called. A 5µs function called 10× a day is fine. A 50ns function called a million times per request matters.
- What the rest of the system is doing while it runs. Lock contention, GC pauses, network jitter all dwarf microbenchmark differences.
- Real-world data shapes. Your benchmark hits one input; production hits a distribution.
Profiling (next chapter) helps with the "how often" question. Load testing (the K6 course) helps with the "real world" question. Use the right tool for the right question.
Quick check
Try it
Interpret real numbers:
- Take the benchmarks you wrote last lesson. Look at each
B/opnumber. - Find the benchmark with the highest
allocs/op. Read the source. Make ONE hypothesis about why it allocates that much. - Try one quick change (use
strings.Builder, pre-allocate a slice withmake([]T, 0, cap), or avoid an intermediate string). Re-run. - Did the numbers move? Write down: how much, in which direction, and your guess at why.
- Paste before / after in notes.md.
What's next
Chapter 2 — Profiling with pprof. Benchmarks tell you ONE function's cost; pprof tells you where a whole program spends its time. The CPU + heap flame graph is the highest-bandwidth tool in Go.
Spot a typo? Have an idea?
Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled — suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.
Suggest an improvement on GitHub