Your first benchmark

A `BenchmarkPluralize` in 10 lines.

7 mineasy

Go has the cleanest built-in benchmarking I've ever used. Write a function with a name starting Benchmark, loop over b.N, run go test -bench=.. Done. This lesson is that whole loop, end-to-end.

The shape of a benchmark

internal/generate/pluralize_test.go
package generate
import "testing"
func BenchmarkPluralize(b *testing.B) {
for i := 0; i < b.N; i++ {
Pluralize("category")
}
}

Three things to absorb:

  • Name starts with Benchmark β€” Go test tool finds it.
  • Takes *testing.B.
  • Loops b.N times. Go decides b.N at runtime β€” it ramps until the benchmark runs long enough to be statistically meaningful (usually a second).

Run it

go test -bench=BenchmarkPluralize -benchmem -count=3 ./internal/generate/

Three flags worth knowing:

  • -bench=BenchmarkXxx β€” which benchmarks to run.=. matches all.
  • -benchmem β€” also report bytes/op and allocs/op. Use it every time.
  • -count=3 β€” run each benchmark 3 times. You want this β€” single runs are noisy.

The output

goos: darwin
goarch: arm64
pkg: github.com/MUKE-coder/grit/internal/generate
BenchmarkPluralize-10 5012461 228.4 ns/op 64 B/op 2 allocs/op
BenchmarkPluralize-10 5102937 226.1 ns/op 64 B/op 2 allocs/op
BenchmarkPluralize-10 5098102 227.8 ns/op 64 B/op 2 allocs/op
PASS
ok github.com/MUKE-coder/grit/internal/generate 4.523s

Decode each column:

  • BenchmarkPluralize-10 β€” the -10 is GOMAXPROCS (cores). Same number across runs means same machine.
  • 5012461 β€” the b.N Go settled on. ~5 million iterations.
  • 228.4 ns/op β€” average nanoseconds per operation. THIS is the headline metric.
  • 64 B/op β€” bytes allocated per op. Lower = less GC pressure.
  • 2 allocs/op β€” number of heap allocations per op. Lower = even less GC pressure.

Why "per op" not "total time"

b.N changes between runs (or even between machines), so absolute total time is meaningless. What matters is the per-op average, which IS comparable across runs and machines (with caveats β€” see "noise" below).

Multiple benchmarks in one file

func BenchmarkPluralize_short(b *testing.B) {
for i := 0; i < b.N; i++ { Pluralize("dog") }
}
func BenchmarkPluralize_long(b *testing.B) {
for i := 0; i < b.N; i++ { Pluralize("administrator") }
}
func BenchmarkPluralize_irregular(b *testing.B) {
for i := 0; i < b.N; i++ { Pluralize("child") }
}

Compare different input shapes side by side. Often the revelation: "long words are 3x slower", "irregular words trigger a fallback path".

What about setup that shouldn't count?

func BenchmarkParseInline(b *testing.B) {
// Setup β€” NOT measured
input := buildLargeStringFromFile()
b.ResetTimer() // restart the clock from here
for i := 0; i < b.N; i++ {
ParseInlineFields(input)
}
}

b.ResetTimer() says "everything before this is setup; start measuring now". Critical for benchmarks where the setup is expensive.

Sub-benchmarks (table-driven)

func BenchmarkPluralize_table(b *testing.B) {
for _, name := range []string{"dog", "cat", "administrator", "child", "deer"} {
b.Run(name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
Pluralize(name)
}
})
}
}

Output shows each as its own line:

BenchmarkPluralize_table/dog-10 5102937 226.1 ns/op
BenchmarkPluralize_table/cat-10 5012461 228.4 ns/op
BenchmarkPluralize_table/administrator-10 4012461 299.8 ns/op
...

One file, comprehensive coverage of the function's input space. The Go convention for serious benchmarking.

Benchmarks are noisy on a laptop. Background Slack, Spotlight indexing, thermal throttling β€” all add jitter. For serious numbers: kill background apps, close browser tabs, plug in to power, run -count=10 and use benchstat (next chapter) to get a statistically sound comparison.

Quick check

You write `for i := 0; i < 100; i++` instead of `for i := 0; i < b.N; i++` in a benchmark. What happens?

Try it

Write your first three benchmarks:

  1. Pick a function in your service layer. Anything pure and side-effect-free is easiest (a parser, a formatter).
  2. Add func BenchmarkX(b *testing.B) { for i := 0; i < b.N; i++ { X() } } to the test file.
  3. Run with go test -bench=. -benchmem -count=3 ./your/pkg/.
  4. Make a 3-input table-driven version (sub-benchmarks).
  5. Paste the output in notes.md. Circle the most interesting line β€” the one that's slower than you expected.

What's next

Next lesson β€” Reading the output. The numbers tell a story. After this lesson you'll be fluent enough to spot "there's a hidden allocation" or "this function is O(nΒ²) and we didn't notice".

Spot a typo? Have an idea?

Help us improve this lesson. One click opens a GitHub issue with the lesson URL pre-filled β€” suggest clearer wording, report a bug, or request more depth. The course keeps improving thanks to learners like you.

Suggest an improvement on GitHub